Examining results

Viewing parameter values

The result of mrpast solve or mrpast process --solve is a list of JSON files, each of which captures a solution to the maximum likelihood problem. Each of these results was generated by searching from a different starting place in the space of parameter solutions. The filename of the best one will be printed to stdout by mrpast solve. You can also use the get_best.py to tell you which of a set of JSON files has the highest likelihood.

Once you have a JSON file of interest, you can view it directly in a text or JSON editor, or you can (more usefully) use mrpast show. This command shows you parameter values and their error quantities (if ground_truth makes sense for your model).

Parameter confidence intervals

There are two methods for producing confidence intervals, but the mrpast confidence command is used for both.

Bootstrap confidence intervals

mrpast confidence solver_output.json can be pretty slow (hint: use -j <threads> to speed it up), as it runs every bootstrap sample through the maximum likelihood solver and produces results in two places:

  1. Directory solver_output.bootstrap.out/ which contains all of the intermediate solver results for every bootstrap sample.

  2. File solver_output.bootstrap.csv which contains a summary of all of the parameter and likelihood values for every bootstrap sample.

The confidence intervals are not actually in either output, you need to use either mrpast show or mrpast.result.summarize_bootstrap_data(). Examples:

mrpast show solver_output.bs_summary.csv

OR

from mrpast.result import summarize_bootstrap_data
import pandas as pd

# The raw dataframe containing all the parameter values from bootstrapping.
raw_dataframe = pd.read_csv("solver_output.bootstrap.csv")

# The summarized dataframe, which contains the mean or median parameter values, and their confidence intervals.
sum_dataframe = summarize_bootstrap_data(raw_dataframe, use_median=True, interval_conf=0.95)

Theoretical confidence intervals

Using the GIM-based confidence intervals is much faster, but likely less accurate, than using the bootstrapped intervals. The bootstrapped intervals are recommended for use, unless you are using a model so large that bootstrapping is computationally infeasible (in which case, the confidence intervals should be taken with a grain of salt).

mrpast confidence --gim solver_output.json will make a copy of solver_output.json (solver_output.gim.json) that contains a confidence interval for each parameter (using the Godambe Information Matrix formulation). These intervals can be examined using mrpast.result.load_json_pandas(). Example:

from mrpast.result import load_json_pandas
dataframe = load_json_pandas("solver_output.gim.json", interval_field="gim_ci")

It also outputs a summary .csv file that can be used with mrpast show:

mrpast show solver_output.gim_summary.csv

Model selection

mrpast has an implementation of Akaike Information Criterion (AIC), which is based on the composite likelihood-adjusted variation of AIC.

AIC can rank multiple possible models that have been evaluated on the same data. The lowest AIC score is the “selected model.” We have found that often overly complex models (i.e., more complex than the model that generated the data) can sometimes be selected, or have an AIC score very close to the true model’s. For these reasons, it is recommended to look at a distribution of the AIC scores over the set of bootstrap samples. mrpast contains a check to verify the bootstrap samples between two competing models are identical, because the data must be the same for a fair evaluation of the models. If the distributions of AIC values are indistinguishable, then the simpler model should be preferred.

AIC on a single result

Consider we have two models: modelA and modelB, and we have run mrpast process on the same data (ARGs), but once with modelA.yaml and once with modelB.yaml. The resulting best solver outputs we’ll call best.modelA.out.json and best.modelB.out.json. We can generate the AIC data using:

mrpast select best.modelA.out.json best.modelB.out.json > modelA_modelB.AIC.json

This resulting JSON file can be loaded into a Pandas dataframe to be examined:

import pandas as pd
import json
with open(join(RESULT_DIR, RESULTS["5D1E"])) as f:
  aic_values = json.load(f)["aic_values"]
dataframe = pd.DataFrame.from_dict(aic_values)

Each bootstrap sample for each file is a row in the DataFrame, and the AIC (unadjusted), AIC_cl (composite likelihood adjusted AIC, the one that should typically be used), and cL (composite log-likelihood value) are present for each row.

AIC on bootstrap samples

To run AIC over all bootstrap samples, just use the --bootstrap flag:

mrpast select --bootstrap best.modelA.out.json best.modelB.out.json > modelA_modelB.bootstrap.AIC.json

This command will fail if you have not previously run:

# Very SLOW! Solves for all bootstrap samples
mrpast confidence -j 8 best.modelA.out.json
mrpast confidence -j 8 best.modelB.out.json

The modelA_modelB.bootstrap.AIC.json output JSON has the same format as the non-bootstrap version. Hint: you can use the --replicates flag to reduce the number of solver replicates for each bootstrap run to speed up the bootstrap process.

Reading/processing results

Dataframe for point estimates

The result of mrpast solve or mrpast process --solve can be imported as a dataframe using mrpast.result.load_json_pandas(). Example:

from mrpast.result import load_json_pandas
dataframe = load_json_pandas("solver_output.json")

Dataframe for bootstrap results

The result of mrpast confidence can also be imported as a dataframe using mrpast.result.summarize_bootstrap_data(). See the example above. The bootstrap results contain more than just confidence interval information.