Suppr超能文献

分子系统发育分析中的信号、噪声与可靠性。

Signal, noise, and reliability in molecular phylogenetic analyses.

作者信息

Hillis D M, Huelsenbeck J P

机构信息

Department of Zoology, University of Texas, Austin 78712.

出版信息

J Hered. 1992 May-Jun;83(3):189-95. doi: 10.1093/oxfordjournals.jhered.a111190.

Abstract

DNA sequences and other molecular data compared among organisms may contain phylogenetic signal, or they may be randomized with respect to phylogenetic history. Some method is needed to distinguish phylogenetic signal from random noise to avoid analysis of data that have been randomized with respect to the historical relationships of the taxa being compared. We analyzed 8,000 random data matrices consisting of 10-500 binary or four-state characters and 5-25 taxa to study several options for detecting signal in systematic data bases. Analysis of random data often yields a single most-parsimonious tree, especially if the number of characters examined is large and the number of taxa examined is small (both often true in molecular studies). The most-parsimonious tree inferred from random data may also be considerably shorter than the second-best alternative. The distribution of tree lengths of all tree topologies (or a random sample thereof) provides a sensitive measure of phylogenetic signal: data matrices with phylogenetic signal produce tree-length distributions that are strongly skewed to the left, whereas those composed of random noise are closer to symmetrical. In simulations of phylogeny with varying rates of mutation (up to levels that produce random variation among taxa), the skewness of tree-length distributions is closely related to the success of parsimony in finding the true phylogeny. Tables of critical values of a skewness test statistic, g1, are provided for binary and four-state characters for 10-500 characters and 5-25 taxa. These tables can be used in a rapid and efficient test for significant structure in data matrices for phylogenetic analysis.

摘要

在生物体之间比较的DNA序列和其他分子数据可能包含系统发育信号,或者它们可能相对于系统发育历史是随机化的。需要某种方法来区分系统发育信号和随机噪声,以避免对那些相对于所比较分类单元的历史关系已被随机化的数据进行分析。我们分析了由10 - 500个二元或四元特征以及5 - 25个分类单元组成的8000个随机数据矩阵,以研究在系统发育数据库中检测信号的几种选择。对随机数据的分析通常会产生一棵单一的最简约树,特别是如果所检查的特征数量很大且所检查的分类单元数量很小(这在分子研究中通常都是如此)。从随机数据推断出的最简约树也可能比次优替代树短得多。所有树形拓扑结构(或其随机样本)的树长分布提供了一种对系统发育信号的敏感度量:具有系统发育信号的数据矩阵产生的树长分布强烈向左偏斜,而由随机噪声组成的数据矩阵则更接近对称。在具有不同突变率(直至产生分类单元间随机变异的水平)的系统发育模拟中,树长分布的偏度与简约法在找到真实系统发育关系方面的成功密切相关。针对10 - 500个特征和5 - 25个分类单元的二元和四元特征,提供了偏度检验统计量g1的临界值表。这些表格可用于对系统发育分析的数据矩阵中的显著结构进行快速有效的检验。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验