Using real data
The workflow for analyzing real (non-simulated data) is typically:
mrpast polarizethe data.mrpast arginferon the polarized data to produce ARGs.mrpast process --solveto process and solve the maximum likelihood problem.mrpast confidenceto generate confidence intervals on the parameters.
There are some additional considerations:
It is often best to pass these options to
mrpast process:--rate-maps ratemap.chr --rate-map-threshold 1e-9. This requires the ARG to only be sampled from regions with a recombination rate less than1e-9. We have found that ARG inference tends to be more accurate in such regions.If you are concerned about particular regions of the genome (either the quality of sequencing, or things like selection influencing results), you can modify your rate maps to set the recombination rate really high in those regions. Then the above recombination rate threshold will prevent sampling from those regions.