Diniz-Filho José Alexandre Felizola, de Sant'Ana Carlos Eduardo Ramos, Bini Luis Mauricio
Departamento de Biologia Geral, Institute de Ciências Biológicas, Universidade Federal de Goiás. Cx.P. 131, 74.001-970, Goiânia, GO, Brasil.
Escola Técnica Federal de Goiás, Coordenação de Química e Biologia. Rua 75, n. 46, 74.055-110, Goiânia, GO, Brasil.
Evolution. 1998 Oct;52(5):1247-1262. doi: 10.1111/j.1558-5646.1998.tb02006.x.
We propose a new method to estimate and correct for phylogenetic inertia in comparative data analysis. The method, called phylogenetic eigenvector regression (PVR) starts by performing a principal coordinate analysis on a pairwise phylogenetic distance matrix between species. Traits under analysis are regressed on eigenvectors retained by a broken-stick model in such a way that estimated values express phylogenetic trends in data and residuals express independent evolution of each species. This partitioning is similar to that realized by the spatial autoregressive method, but the method proposed here overcomes the problem of low statistical performance that occurs with autoregressive method when phylogenetic correlation is low or when sample size is too small to detect it. Also, PVR is easier to perform with large samples because it is based on well-known techniques of multivariate and regression analyses. We evaluated the performance of PVR and compared it with the autoregressive method using real datasets and simulations. A detailed worked example using body size evolution of Carnivora mammals indicated that phylogenetic inertia in this trait is elevated and similarly estimated by both methods. In this example, Type I error at α = 0.05 of PVR was equal to 0.048, but an increase in the number of eigenvectors used in the regression increases the error. Also, similarity between PVR and the autoregressive method, defined by correlation between their residuals, decreased by overestimating the number of eigenvalues necessary to express the phylogenetic distance matrix. To evaluate the influence of cladogram topology on the distribution of eigenvalues extracted from the double-centered phylogenetic distance matrix, we analyzed 100 randomly generated cladograms (up to 100 species). Multiple linear regression of log transformed variables indicated that the number of eigenvalues extracted by the broken-stick model can be fully explained by cladogram topology. Therefore, the broken-stick model is an adequate criterion for determining the correct number of eigenvectors to be used by PVR. We also simulated distinct levels of phylogenetic inertia by producing a trend across 10, 25, and 50 species arranged in "comblike" cladograms and then adding random vectors with increased residual variances around this trend. In doing so, we provide an evaluation of the performance of both methods with data generated under different evolutionary models than tested previously. The results showed that both PVR and autoregressive method are efficient in detecting inertia in data when sample size is relatively high (more than 25 species) and when phylogenetic inertia is high. However, PVR is more efficient at smaller sample sizes and when level of phylogenetic inertia is low. These conclusions were also supported by the analysis of 10 real datasets regarding body size evolution in different animal clades. We concluded that PVR can be a useful alternative to an autoregressive method in comparative data analysis.
我们提出了一种新方法,用于在比较数据分析中估计和校正系统发育惯性。该方法称为系统发育特征向量回归(PVR),首先对物种间的成对系统发育距离矩阵进行主坐标分析。将所分析的性状对由折断棍棒模型保留的特征向量进行回归,使得估计值表示数据中的系统发育趋势,而残差表示每个物种的独立进化。这种划分类似于空间自回归方法所实现的划分,但这里提出的方法克服了自回归方法在系统发育相关性较低或样本量太小而无法检测到时出现的统计性能较低的问题。此外,PVR在处理大样本时更容易执行,因为它基于多元分析和回归分析的知名技术。我们评估了PVR的性能,并使用真实数据集和模拟将其与自回归方法进行了比较。一个使用食肉目哺乳动物体型进化的详细实例表明,该性状的系统发育惯性较高,且两种方法对其的估计相似。在这个例子中,PVR在α = 0.05时的I型错误等于0.048,但回归中使用的特征向量数量增加会导致错误增加。此外,PVR与自回归方法之间的相似性(由它们残差之间的相关性定义),会因高估表示系统发育距离矩阵所需的特征值数量而降低。为了评估系统发育树拓扑结构对从双中心系统发育距离矩阵中提取的特征值分布的影响,我们分析了100个随机生成的系统发育树(最多100个物种)。对对数转换变量的多元线性回归表明,折断棍棒模型提取的特征值数量可以完全由系统发育树拓扑结构解释。因此,折断棍棒模型是确定PVR要使用的正确特征向量数量的合适标准。我们还通过在排列成“梳状”系统发育树的10、25和50个物种中产生一种趋势,然后围绕该趋势添加具有增加的残差方差的随机向量,模拟了不同水平的系统发育惯性。通过这样做,我们对这两种方法在与之前测试不同的进化模型下生成的数据的性能进行了评估。结果表明,当样本量相对较高(超过25个物种)且系统发育惯性较高时,PVR和自回归方法在检测数据中的惯性方面都是有效的。然而,在样本量较小时以及系统发育惯性水平较低时,PVR更有效。对10个关于不同动物类群体型进化的真实数据集的分析也支持了这些结论。我们得出结论,在比较数据分析中,PVR可以作为自回归方法的一个有用替代方法。