Center for Computational Genetics and Genomics, Department of Biology, Temple University, Philadelphia, PA, USA.
Mol Ecol Resour. 2019 Nov;19(6):1593-1609. doi: 10.1111/1755-0998.13083. Epub 2019 Sep 24.
Many methods for fitting demographic models to data sets of aligned sequences rely upon an assumption that the data have a branching coalescent history without recombination within regions or loci. To mitigate the effects of the failure of this assumption, a common approach is to filter data and sample regions that pass the four-gamete criterion for recombination, an approach that allows data to run, but that is expected to detect only a minority of recombination events. A series of empirical tests of this approach were conducted using computer simulations with and without recombination for a variety of isolation-with-migration (IM) model for two and three populations. Only the IMa3 program was used, but the general results should apply to related genealogy-sampling-based methods for IM models or subsets of IM models. It was found that the details of sampling intervals that pass a four-gamete filter have a moderate effect, and that schemes that use the longest intervals, or that use overlapping intervals, gave poorer results. A simple approach of using a random nonoverlapping interval returned the smallest difference between results with and without recombination, with the mean difference between parameter estimates usually less than 20% of the true value (usually much less). However, the posterior probability distributions for migration rates were flatter with recombination, suggesting that filtering based on the four-gamete criterion, while necessary for methods like these, leads to reduced resolution on migration. A distinct, alternative approach, of using a finite sites mutation model and not filtering the data, performed quite poorly.
许多将人口统计模型拟合到对齐序列数据集的方法都依赖于一个假设,即数据具有分支合并历史,而没有区域或基因座内的重组。为了减轻该假设失败的影响,一种常见的方法是过滤通过重组四配子标准的数据和样本区域,这种方法允许数据运行,但预计只能检测到少数重组事件。使用带有和不带有重组的计算机模拟对这种方法进行了一系列经验测试,模拟了两种和三种群体的各种隔离-迁移(IM)模型。虽然只使用了 IMa3 程序,但一般结果应该适用于基于谱系抽样的 IM 模型或 IM 模型子集的相关方法。结果发现,通过四配子过滤器的抽样间隔的细节具有中等影响,使用最长间隔或使用重叠间隔的方案会产生较差的结果。使用随机非重叠间隔的简单方法返回了有重组和无重组之间的最小差异,参数估计的平均差异通常小于真实值的 20%(通常要小得多)。但是,迁移率的后验概率分布在有重组时更平坦,这表明基于四配子标准的过滤虽然对于这些方法是必要的,但会导致迁移的分辨率降低。一种截然不同的替代方法是使用有限位点突变模型而不过滤数据,其表现非常差。