School of Biological Sciences, University of Reading, Whiteknights, PO Box 228, Reading RG6 6AJ, UK.
Syst Biol. 2010 Jul;59(4):415-32. doi: 10.1093/sysbio/syq022. Epub 2010 May 24.
Nested clade phylogeographic analysis (NCPA) is a widely used method that aims to identify past demographic events that have shaped the history of a population. In an earlier study, NCPA has been fully automated, allowing it to be tested with simulated data sets generated under a null model in which samples simulated from a panmictic population are geographically distributed. It was noted that NCPA was prone to inferring false positives, corroborating earlier findings. The present study aims to evaluate both single-locus and multilocus NCPA under the scenario of restricted gene flow among spatially distributed populations. We have developed a new program, ANeCA-ML, which implements multilocus NCPA. Data were simulated under 3 models of gene flow: a stepping stone model, an island model, and a stepping stone model with some long-distance dispersal. Results indicate that single-locus NCPA tends to give a high frequency of false positives, but, unlike the random-mating scenario presented previously, inferences are not limited to restricted gene flow with isolation by distance or contiguous range expansion. The proportion of single-locus data sets that contained false inferences was 76% for the panmictic case, 87% for the stepping stone model, 79% for the stepping stone model with long-distance dispersal, and more than 99% for the island model. The frequency of inferences is inversely related to the amount of gene flow between demes. We performed multilocus NCPA by grouping the simulated loci into data sets of 5 loci. The false-positive rate was reduced in multilocus NCPA for some inferences but remained high for others. The proportion of multilocus data sets that contained false inferences was 17% for the panmictic case, 30% for the stepping stone model, 4% for the stepping stone model with long-distance dispersal, and 54% for the island model. Multilocus NCPA reduces the false-positive rate by restricting the sensitivity of the method but does not appear to increase the accuracy of the approach. Three classical tests-the analysis of molecular variance method, Fu's Fs, and the Mantel test-show that there is information in the data that gives rise to explicable results using these standard approaches. In conclusion, for the scenarios that we have examined, our simulation study suggests that the NCPA method is unreliable and its inferences may be misleading. We suggest that the NCPA method should not be used without objective simulation-based testing by independent researchers.
嵌套分支谱系地理分析(NCPA)是一种广泛使用的方法,旨在识别过去塑造种群历史的人口统计学事件。在早期的研究中,NCPA 已经完全自动化,可以用模拟数据进行测试,这些模拟数据是根据一个无效模型生成的,其中从混合种群中模拟的样本在地理上分布。有人指出,NCPA 容易推断出假阳性,这与早期的发现相符。本研究旨在评估单基因座和多基因座 NCPA 在空间分布种群基因流动受限的情况下的表现。我们开发了一个新程序 ANeCA-ML,它实现了多基因座 NCPA。数据是在 3 种基因流动模型下模拟的:步石模型、岛屿模型和带有一些长距离扩散的步石模型。结果表明,单基因座 NCPA 往往会产生很高的假阳性率,但与之前提出的随机交配情况不同,推断结果不仅限于与距离隔离或连续范围扩张相关的受限基因流动。在混合种群情况下,包含错误推断的单基因座数据集比例为 76%,步石模型为 87%,带有长距离扩散的步石模型为 79%,岛屿模型为 99%以上。推断的频率与种群之间的基因流动量成反比。我们通过将模拟基因座分组为 5 个基因座的数据集来进行多基因座 NCPA。对于一些推断,多基因座 NCPA 降低了假阳性率,但对于其他推断,假阳性率仍然很高。在混合种群情况下,包含错误推断的多基因座数据集比例为 17%,步石模型为 30%,带有长距离扩散的步石模型为 4%,岛屿模型为 54%。多基因座 NCPA 通过限制方法的敏感性来降低假阳性率,但似乎并没有提高方法的准确性。三个经典检验——分子方差分析方法、Fu 的 Fs 和 Mantel 检验——表明数据中存在可以用这些标准方法解释的信息。总之,对于我们所检查的场景,我们的模拟研究表明,NCPA 方法不可靠,其推断可能会产生误导。我们建议,在没有独立研究人员进行客观基于模拟的测试的情况下,不应使用 NCPA 方法。