嵌套分支系统发育地理学分析的自动化与评估

The automation and evaluation of nested clade phylogeographic analysis.

作者信息

Panchal Mahesh, Beaumont Mark A

机构信息

School of Biological Sciences, University of Reading, Whiteknights, Reading, UK.

出版信息

Evolution. 2007 Jun;61(6):1466-80. doi: 10.1111/j.1558-5646.2007.00124.x.

DOI:10.1111/j.1558-5646.2007.00124.x

PMID:17542853

Abstract

Nested clade phylogeographic analysis (NCPA) is a popular method for reconstructing the demographic history of spatially distributed populations from genetic data. Although some parts of the analysis are automated, there is no unique and widely followed algorithm for doing this in its entirety, beginning with the data, and ending with the inferences drawn from the data. This article describes a method that automates NCPA, thereby providing a framework for replicating analyses in an objective way. To do so, a number of decisions need to be made so that the automated implementation is representative of previous analyses. We review how the NCPA procedure has evolved since its inception and conclude that there is scope for some variability in the manual application of NCPA. We apply the automated software to three published datasets previously analyzed manually and replicate many details of the manual analyses, suggesting that the current algorithm is representative of how a typical user will perform NCPA. We simulate a large number of replicate datasets for geographically distributed, but entirely random-mating, populations. These are then analyzed using the automated NCPA algorithm. Results indicate that NCPA tends to give a high frequency of false positives. In our simulations we observe that 14% of the clades give a conclusive inference that a demographic event has occurred, and that 75% of the datasets have at least one clade that gives such an inference. This is mainly due to the generation of multiple statistics per clade, of which only one is required to be significant to apply the inference key. We survey the inferences that have been made in recent publications and show that the most commonly inferred processes (restricted gene flow with isolation by distance and contiguous range expansion) are those that are commonly inferred in our simulations. However, published datasets typically yield a richer set of inferences with NCPA than obtained in our random-mating simulations, and further testing of NCPA with models of structured populations is necessary to examine its accuracy.

摘要

嵌套支系系统地理学分析（NCPA）是一种从遗传数据重建空间分布种群人口历史的常用方法。尽管分析的某些部分是自动化的，但从数据开始到从数据得出推论，目前还没有一种独特且被广泛遵循的完整算法。本文描述了一种使NCPA自动化的方法，从而提供了一个以客观方式重复分析的框架。为此，需要做出一些决策，以便自动化实现能够代表先前的分析。我们回顾了NCPA程序自诞生以来的发展历程，并得出结论：在手动应用NCPA时存在一定的变异性空间。我们将自动化软件应用于三个先前手动分析过的已发表数据集，并重现了手动分析的许多细节，这表明当前算法代表了典型用户执行NCPA的方式。我们为地理分布但完全随机交配的种群模拟了大量重复数据集。然后使用自动化NCPA算法对这些数据集进行分析。结果表明，NCPA往往会产生较高频率的假阳性。在我们的模拟中，我们观察到14%的支系得出了人口事件发生的确凿推论，并且75%的数据集至少有一个支系给出了这样的推论。这主要是由于每个支系会生成多个统计量，而应用推论关键只需要其中一个显著即可。我们调查了近期出版物中所做的推论，并表明最常推断出的过程（距离隔离和连续范围扩展导致的基因流受限）也是我们模拟中通常推断出的过程。然而，已发表的数据集使用NCPA通常会得出比我们随机交配模拟中更丰富的推论集，并且有必要使用结构化种群模型对NCPA进行进一步测试以检验其准确性。