Suppr超能文献

衡量序列数据与系统发育模型的拟合度:使用边缘检验增加功效。

Measuring fit of sequence data to phylogenetic model: gain of power using marginal tests.

机构信息

Department of Biological Sciences, Purdue University, West Lafayette, IN 47906, USA.

出版信息

J Mol Evol. 2009 Oct;69(4):289-99. doi: 10.1007/s00239-009-9268-8. Epub 2009 Oct 23.

Abstract

Testing fit of data to model is fundamentally important to any science, but publications in the field of phylogenetics rarely do this. Such analyses discard fundamental aspects of science as prescribed by Karl Popper. Indeed, not without cause, Popper (Unended quest: an intellectual autobiography. Fontana, London, 1976) once argued that evolutionary biology was unscientific as its hypotheses were untestable. Here we trace developments in assessing fit from Penny et al. (Nature 297:197-200, 1982) to the present. We compare the general log-likelihood ratio (the G or G (2) statistic) statistic between the evolutionary tree model and the multinomial model with that of marginalized tests applied to an alignment (using placental mammal coding sequence data). It is seen that the most general test does not reject the fit of data to model (P approximately 0.5), but the marginalized tests do. Tests on pairwise frequency (F) matrices, strongly (P < 0.001) reject the most general phylogenetic (GTR) models commonly in use. It is also clear (P < 0.01) that the sequences are not stationary in their nucleotide composition. Deviations from stationarity and homogeneity seem to be unevenly distributed amongst taxa; not necessarily those expected from examining other regions of the genome. By marginalizing the 4( t ) patterns of the i.i.d. model to observed and expected parsimony counts, that is, from constant sites, to singletons, to parsimony informative characters of a minimum possible length, then the likelihood ratio test regains power, and it too rejects the evolutionary model with P << 0.001. Given such behavior over relatively recent evolutionary time, readers in general should maintain a healthy skepticism of results, as the scale of the systematic errors in published trees may really be far larger than the analytical methods (e.g., bootstrap) report.

摘要

评估数据与模型拟合度对于任何科学都是至关重要的,但系统发生学领域的出版物很少这样做。这种分析摒弃了卡尔·波普尔(Karl Popper)所规定的科学基本原理。事实上,波普尔(Unended quest: an intellectual autobiography. Fontana, London, 1976)曾有一次认为进化生物学是不科学的,因为其假设是不可检验的。在这里,我们追溯了从彭尼等人(Nature 297:197-200, 1982)到现在的评估拟合度的发展。我们将进化树模型和多项模型之间的一般对数似然比(G 或 G(2)统计量)与应用于比对(使用胎盘哺乳动物编码序列数据)的边缘化检验进行比较。结果表明,最一般的检验不拒绝数据与模型的拟合(P 约为 0.5),但边缘化检验则拒绝。对成对频率(F)矩阵的检验强烈(P<0.001)拒绝了目前普遍使用的最一般的系统发生(GTR)模型。很明显(P<0.01),序列在核苷酸组成上不是稳定的。偏离稳定性和同质性似乎在分类群中分布不均匀;不一定是从检查基因组的其他区域中预期的那样。通过将独立同分布(i.i.d.)模型的 4(t)模式边缘化到观察到的和预期的简约计数,即从常数位点到单倍体,再到最小可能长度的简约信息特征,似然比检验恢复了效力,并且它也以 P<0.001 拒绝了进化模型。考虑到这种在相对较近的进化时间内的行为,读者通常应该对结果保持健康的怀疑态度,因为已发表的树中的系统误差的规模可能真的比分析方法(例如,自举法)报告的要大得多。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验