Examining results
Viewing parameter values
The result of mrpast solve or mrpast process --solve is a list of JSON files, each of which captures a solution
to the maximum likelihood problem. Each of these results was generated by searching from a different starting place in
the space of parameter solutions. The filename of the best one will be printed to stdout by mrpast solve. You can
also use the get_best.py to tell you which of
a set of JSON files has the highest likelihood.
Once you have a JSON file of interest, you can view it directly in a text or JSON editor, or you can (more usefully) use
mrpast show. This command shows you parameter values and their error quantities (if ground_truth makes sense for
your model).
Parameter confidence intervals
There are two methods for producing confidence intervals, but the mrpast confidence command is used for both.
Bootstrap confidence intervals
mrpast confidence solver_output.json can be pretty slow (hint: use -j <threads> to speed it up), as it runs every
bootstrap sample through the maximum likelihood solver and produces results in two places:
Directory
solver_output.bootstrap.out/which contains all of the intermediate solver results for every bootstrap sample.File
solver_output.bootstrap.csvwhich contains a summary of all of the parameter and likelihood values for every bootstrap sample.
The confidence intervals are not actually in either output, you need to use either mrpast show or
mrpast.result.summarize_bootstrap_data(). Examples:
mrpast show solver_output.bs_summary.csv
OR
from mrpast.result import summarize_bootstrap_data
import pandas as pd
# The raw dataframe containing all the parameter values from bootstrapping.
raw_dataframe = pd.read_csv("solver_output.bootstrap.csv")
# The summarized dataframe, which contains the mean or median parameter values, and their confidence intervals.
sum_dataframe = summarize_bootstrap_data(raw_dataframe, use_median=True, interval_conf=0.95)
Theoretical confidence intervals
Using the GIM-based confidence intervals is much faster, but likely less accurate, than using the bootstrapped intervals. The bootstrapped intervals are recommended for use, unless you are using a model so large that bootstrapping is computationally infeasible (in which case, the confidence intervals should be taken with a grain of salt).
mrpast confidence --gim solver_output.json will make a copy of solver_output.json (solver_output.gim.json) that
contains a confidence interval for each parameter (using the Godambe Information Matrix formulation). These intervals
can be examined using mrpast.result.load_json_pandas(). Example:
from mrpast.result import load_json_pandas
dataframe = load_json_pandas("solver_output.gim.json", interval_field="gim_ci")
It also outputs a summary .csv file that can be used with mrpast show:
mrpast show solver_output.gim_summary.csv
Model selection
mrpast has an implementation of Akaike Information Criterion (AIC), which is based on the composite likelihood-adjusted variation of AIC.
AIC can rank multiple possible models that have been evaluated on the same data. The lowest AIC score is the “selected model.” We have found that often overly complex models (i.e., more complex than the model that generated the data) can sometimes be selected, or have an AIC score very close to the true model’s. For these reasons, it is recommended to look at a distribution of the AIC scores over the set of bootstrap samples. mrpast contains a check to verify the bootstrap samples between two competing models are identical, because the data must be the same for a fair evaluation of the models. If the distributions of AIC values are indistinguishable, then the simpler model should be preferred.
AIC on a single result
Consider we have two models: modelA and modelB, and we have run mrpast process on the same data (ARGs), but once with
modelA.yaml and once with modelB.yaml. The resulting best solver outputs we’ll call best.modelA.out.json and
best.modelB.out.json. We can generate the AIC data using:
mrpast select best.modelA.out.json best.modelB.out.json > modelA_modelB.AIC.json
This resulting JSON file can be loaded into a Pandas dataframe to be examined:
import pandas as pd
import json
with open(join(RESULT_DIR, RESULTS["5D1E"])) as f:
aic_values = json.load(f)["aic_values"]
dataframe = pd.DataFrame.from_dict(aic_values)
Each bootstrap sample for each file is a row in the DataFrame, and the AIC (unadjusted), AIC_cl
(composite likelihood adjusted AIC, the one that should typically be used), and cL (composite
log-likelihood value) are present for each row.
AIC on bootstrap samples
To run AIC over all bootstrap samples, just use the --bootstrap flag:
mrpast select --bootstrap best.modelA.out.json best.modelB.out.json > modelA_modelB.bootstrap.AIC.json
This command will fail if you have not previously run:
# Very SLOW! Solves for all bootstrap samples
mrpast confidence -j 8 best.modelA.out.json
mrpast confidence -j 8 best.modelB.out.json
The modelA_modelB.bootstrap.AIC.json output JSON has the same format as the non-bootstrap version.
Hint: you can use the --replicates flag to reduce the number of solver replicates for each bootstrap run
to speed up the bootstrap process.
Reading/processing results
Dataframe for point estimates
The result of mrpast solve or mrpast process --solve can be imported as a dataframe using
mrpast.result.load_json_pandas(). Example:
from mrpast.result import load_json_pandas
dataframe = load_json_pandas("solver_output.json")
Dataframe for bootstrap results
The result of mrpast confidence can also be imported as a dataframe using
mrpast.result.summarize_bootstrap_data(). See the example above. The bootstrap results
contain more than just confidence interval information.