.. _results: Examining results ================= Viewing parameter values ~~~~~~~~~~~~~~~~~~~~~~~~ The result of ``mrpast solve`` or ``mrpast process --solve`` is a list of JSON files, each of which captures a solution to the maximum likelihood problem. Each of these results was generated by searching from a different starting place in the space of parameter solutions. The filename of the best one will be printed to stdout by ``mrpast solve``. You can also use the `get_best.py `_ to tell you which of a set of JSON files has the highest likelihood. Once you have a JSON file of interest, you can view it directly in a text or JSON editor, or you can (more usefully) use ``mrpast show``. This command shows you parameter values and their error quantities (if ``ground_truth`` makes sense for your model). Parameter confidence intervals ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There are two methods for producing confidence intervals, but the ``mrpast confidence`` command is used for both. Bootstrap confidence intervals ------------------------------ ``mrpast confidence solver_output.json`` can be pretty slow (hint: use ``-j `` to speed it up), as it runs every bootstrap sample through the maximum likelihood solver and produces results in two places: 1. Directory ``solver_output.bootstrap.out/`` which contains all of the intermediate solver results for every bootstrap sample. 2. File ``solver_output.bootstrap.csv`` which contains a summary of all of the parameter and likelihood values for every bootstrap sample. The confidence intervals are not actually in either output, you need to use either ``mrpast show`` or :py:meth:`mrpast.result.summarize_bootstrap_data`. Examples: :: mrpast show solver_output.bs_summary.csv OR :: from mrpast.result import summarize_bootstrap_data import pandas as pd # The raw dataframe containing all the parameter values from bootstrapping. raw_dataframe = pd.read_csv("solver_output.bootstrap.csv") # The summarized dataframe, which contains the mean or median parameter values, and their confidence intervals. sum_dataframe = summarize_bootstrap_data(raw_dataframe, use_median=True, interval_conf=0.95) Theoretical confidence intervals -------------------------------- Using the GIM-based confidence intervals is much faster, but likely less accurate, than using the bootstrapped intervals. The bootstrapped intervals are recommended for use, unless you are using a model so large that bootstrapping is computationally infeasible (in which case, the confidence intervals should be taken with a grain of salt). ``mrpast confidence --gim solver_output.json`` will make a copy of ``solver_output.json`` (``solver_output.gim.json``) that contains a confidence interval for each parameter (using the Godambe Information Matrix formulation). These intervals can be examined using :py:meth:`mrpast.result.load_json_pandas`. Example: :: from mrpast.result import load_json_pandas dataframe = load_json_pandas("solver_output.gim.json", interval_field="gim_ci") It also outputs a summary ``.csv`` file that can be used with ``mrpast show``: :: mrpast show solver_output.gim_summary.csv Model selection ~~~~~~~~~~~~~~~ mrpast has an implementation of `Akaike Information Criterion `_ (AIC), which is based on the `composite likelihood-adjusted variation `_ of AIC. AIC can rank multiple possible models that have been evaluated on the same data. The lowest AIC score is the "selected model." We have found that often overly complex models (i.e., more complex than the model that generated the data) can sometimes be selected, or have an AIC score very close to the true model's. For these reasons, it is recommended to look at a distribution of the AIC scores over the set of bootstrap samples. mrpast contains a check to verify the bootstrap samples between two competing models are identical, because the data must be the same for a fair evaluation of the models. If the distributions of AIC values are indistinguishable, then the simpler model should be preferred. AIC on a single result ---------------------- Consider we have two models: ``modelA`` and ``modelB``, and we have run ``mrpast process`` on the same data (ARGs), but once with ``modelA.yaml`` and once with ``modelB.yaml``. The resulting best solver outputs we'll call ``best.modelA.out.json`` and ``best.modelB.out.json``. We can generate the AIC data using: :: mrpast select best.modelA.out.json best.modelB.out.json > modelA_modelB.AIC.json This resulting JSON file can be loaded into a Pandas dataframe to be examined: :: import pandas as pd import json with open(join(RESULT_DIR, RESULTS["5D1E"])) as f: aic_values = json.load(f)["aic_values"] dataframe = pd.DataFrame.from_dict(aic_values) Each bootstrap sample for each file is a row in the DataFrame, and the ``AIC`` (unadjusted), ``AIC_cl`` (composite likelihood adjusted AIC, the one that should typically be used), and ``cL`` (composite log-likelihood value) are present for each row. AIC on bootstrap samples ------------------------ To run AIC over all bootstrap samples, just use the ``--bootstrap`` flag: :: mrpast select --bootstrap best.modelA.out.json best.modelB.out.json > modelA_modelB.bootstrap.AIC.json This command will fail if you have not previously run: :: # Very SLOW! Solves for all bootstrap samples mrpast confidence -j 8 best.modelA.out.json mrpast confidence -j 8 best.modelB.out.json The ``modelA_modelB.bootstrap.AIC.json`` output JSON has the same format as the non-bootstrap version. Hint: you can use the ``--replicates`` flag to reduce the number of solver replicates for each bootstrap run to speed up the bootstrap process. Reading/processing results ~~~~~~~~~~~~~~~~~~~~~~~~~~ Dataframe for point estimates ----------------------------- The result of ``mrpast solve`` or ``mrpast process --solve`` can be imported as a dataframe using :py:meth:`mrpast.result.load_json_pandas`. Example: :: from mrpast.result import load_json_pandas dataframe = load_json_pandas("solver_output.json") Dataframe for bootstrap results ------------------------------- The result of ``mrpast confidence`` can also be imported as a dataframe using :py:meth:`mrpast.result.summarize_bootstrap_data`. See the example above. The bootstrap results contain more than just confidence interval information.