.. _results:

Examining results
=================

Viewing parameter values
~~~~~~~~~~~~~~~~~~~~~~~~

The result of ``mrpast solve`` or ``mrpast process --solve`` is a list of JSON files, each of which captures a solution
to the maximum likelihood problem. Each of these results was generated by searching from a different starting place in
the space of parameter solutions. The filename of the best one will be printed to stdout by ``mrpast solve``. You can
also use the `get_best.py <https://github.com/aprilweilab/mrpast/blob/main/scripts/get_best.py>`_ to tell you which of
a set of JSON files has the highest likelihood.

Once you have a JSON file of interest, you can view it directly in a text or JSON editor, or you can (more usefully) use
``mrpast show``. This command shows you parameter values and their error quantities (if ``ground_truth`` makes sense for
your model).

Parameter confidence intervals
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There are two methods for producing confidence intervals, but the ``mrpast confidence`` command is used for both.

Bootstrap confidence intervals
------------------------------

``mrpast confidence solver_output.json`` can be pretty slow (hint: use ``-j <threads>`` to speed it up), as it runs every
bootstrap sample through the maximum likelihood solver and produces results in two places:

1. Directory ``solver_output.bootstrap.out/`` which contains all of the intermediate solver results for every bootstrap sample.
2. File ``solver_output.bootstrap.csv`` which contains a summary of all of the parameter and likelihood values for every bootstrap sample.

The confidence intervals are not actually in either output, you need to use either ``mrpast show`` or
:py:meth:`mrpast.result.summarize_bootstrap_data`. Examples:

::

  mrpast show solver_output.bs_summary.csv


OR


::

  from mrpast.result import summarize_bootstrap_data
  import pandas as pd

  # The raw dataframe containing all the parameter values from bootstrapping.
  raw_dataframe = pd.read_csv("solver_output.bootstrap.csv")

  # The summarized dataframe, which contains the mean or median parameter values, and their confidence intervals.
  sum_dataframe = summarize_bootstrap_data(raw_dataframe, use_median=True, interval_conf=0.95)


Theoretical confidence intervals
--------------------------------

Using the GIM-based confidence intervals is much faster, but likely less accurate, than using
the bootstrapped intervals. The bootstrapped intervals are recommended for use, unless you are
using a model so large that bootstrapping is computationally infeasible (in which case, the
confidence intervals should be taken with a grain of salt).

``mrpast confidence --gim solver_output.json`` will make a copy of ``solver_output.json``  (``solver_output.gim.json``) that
contains a confidence interval for each parameter (using the Godambe Information Matrix formulation). These intervals
can be examined using :py:meth:`mrpast.result.load_json_pandas`. Example:

::

  from mrpast.result import load_json_pandas
  dataframe = load_json_pandas("solver_output.gim.json", interval_field="gim_ci")


It also outputs a summary ``.csv`` file that can be used with ``mrpast show``:

::

  mrpast show solver_output.gim_summary.csv


Model selection
~~~~~~~~~~~~~~~

mrpast has an implementation of `Akaike Information Criterion <https://en.wikipedia.org/wiki/Akaike_information_criterion>`_ (AIC), which is
based on the `composite likelihood-adjusted variation <https://academic.oup.com/biomet/article-abstract/92/3/519/218901>`_ of AIC.

AIC can rank multiple possible models that have been evaluated on the same data. The lowest AIC score is the "selected model." We have found
that often overly complex models (i.e., more complex than the model that generated the data) can sometimes be selected, or have an AIC score
very close to the true model's. For these reasons, it is recommended to look at a distribution of the AIC scores over the set of bootstrap
samples. mrpast contains a check to verify the bootstrap samples between two competing models are identical, because the data must be the same
for a fair evaluation of the models. If the distributions of AIC values are indistinguishable, then the simpler model should be
preferred.

AIC on a single result
----------------------

Consider we have two models: ``modelA`` and ``modelB``, and we have run ``mrpast process`` on the same data (ARGs), but once with
``modelA.yaml`` and once with ``modelB.yaml``. The resulting best solver outputs we'll call ``best.modelA.out.json`` and
``best.modelB.out.json``. We can generate the AIC data using:

::

  mrpast select best.modelA.out.json best.modelB.out.json > modelA_modelB.AIC.json

This resulting JSON file can be loaded into a Pandas dataframe to be examined:

::

  import pandas as pd
  import json
  with open(join(RESULT_DIR, RESULTS["5D1E"])) as f:
    aic_values = json.load(f)["aic_values"]
  dataframe = pd.DataFrame.from_dict(aic_values)

Each bootstrap sample for each file is a row in the DataFrame, and the ``AIC`` (unadjusted), ``AIC_cl``
(composite likelihood adjusted AIC, the one that should typically be used), and ``cL`` (composite
log-likelihood value) are present for each row.

AIC on bootstrap samples
------------------------

To run AIC over all bootstrap samples, just use the ``--bootstrap`` flag:

::

  mrpast select --bootstrap best.modelA.out.json best.modelB.out.json > modelA_modelB.bootstrap.AIC.json

This command will fail if you have not previously run:

::

  # Very SLOW! Solves for all bootstrap samples
  mrpast confidence -j 8 best.modelA.out.json
  mrpast confidence -j 8 best.modelB.out.json


The ``modelA_modelB.bootstrap.AIC.json`` output JSON has the same format as the non-bootstrap version.
Hint: you can use the ``--replicates`` flag to reduce the number of solver replicates for each bootstrap run
to speed up the bootstrap process.

Reading/processing results
~~~~~~~~~~~~~~~~~~~~~~~~~~

Dataframe for point estimates
-----------------------------

The result of ``mrpast solve`` or ``mrpast process --solve`` can be imported as a dataframe using
:py:meth:`mrpast.result.load_json_pandas`. Example:

::

  from mrpast.result import load_json_pandas
  dataframe = load_json_pandas("solver_output.json")

Dataframe for bootstrap results
-------------------------------

The result of ``mrpast confidence`` can also be imported as a dataframe using
:py:meth:`mrpast.result.summarize_bootstrap_data`. See the example above. The bootstrap results
contain more than just confidence interval information.