Using real data
The workflow for analyzing real (non-simulated data) is typically:
[NOT NEEDED FOR tsinfer]
mrpast polarizethe data. Only Relate and SINGER need the data filtered and polarized prior to runningmrpast arginfer, since tsinfer takes the ancestral state as an input argument you can just pass in an ancestral sequence with--ancestral.mrpast arginferon the polarized data to produce ARGs.mrpast process --solveto process and solve the maximum likelihood problem.mrpast confidenceto generate confidence intervals on the parameters.
There are some additional considerations:
It is often best to pass these options to
mrpast process:--rate-maps ratemap.chr --rate-map-threshold 1e-9. This requires the ARG to only be sampled from regions with a recombination rate less than1e-9. We have found that ARG inference tends to be more accurate in such regions.If you are concerned about particular regions of the genome (either the quality of sequencing, or things like selection influencing results), you can modify your rate maps to set the recombination rate really high in those regions. Then the above recombination rate threshold will prevent sampling from those regions.