从进化距离矩阵中提取所有系统发育信息。

Toward extracting all phylogenetic information from matrices of evolutionary distances.

机构信息

Department of Mathematics, University of California at Los Angeles, 520 Portola Plaza, Los Angeles, CA 90095, USA.

出版信息

Science. 2010 Mar 12;327(5971):1376-9. doi: 10.1126/science.1182300.

PMID:20223986

Abstract

The matrix of evolutionary distances is a model-based statistic, derived from molecular sequences, summarizing the pairwise phylogenetic relations between a collection of species. Phylogenetic tree reconstruction methods relying on this matrix are relatively fast and thus widely used in molecular systematics. However, because of their intrinsic reliance on summary statistics, distance-matrix methods are assumed to be less accurate than likelihood-based approaches. In this paper, pairwise sequence comparisons are shown to be more powerful than previously hypothesized. A statistical analysis of certain distance-based techniques indicates that their data requirement for large evolutionary trees essentially matches the conjectured performance of maximum likelihood methods--challenging the idea that summary statistics lead to suboptimal analyses. On the basis of a connection between ancestral state reconstruction and distance averaging, the critical role played by the covariances of the distance matrix is identified.

摘要

进化距离矩阵是一种基于模型的统计量，源自分子序列，总结了一组物种之间的成对系统发育关系。依赖于该矩阵的系统发育树重建方法相对较快，因此在分子系统学中得到了广泛应用。然而，由于其内在依赖于汇总统计量，距离矩阵方法被认为不如基于似然的方法准确。本文表明，成对序列比较比以前假设的更强大。对某些基于距离的技术的统计分析表明，它们对大型进化树的数据要求基本上与最大似然方法的预期性能相匹配——这挑战了汇总统计量导致次优分析的观点。基于祖先状态重建和距离平均之间的联系，确定了距离矩阵协方差所起的关键作用。