Suppr超能文献

基于距离的系统发育重建中的随机误差与建模误差

Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions.

作者信息

Doerr Daniel, Gronau Ilan, Moran Shlomo, Yavneh Irad

机构信息

Computer Science Department, Technion - Israel Institute of Technology, Haifa, Israel.

出版信息

Algorithms Mol Biol. 2012 Aug 31;7(1):22. doi: 10.1186/1748-7188-7-22.

Abstract

BACKGROUND

Distance-based phylogenetic reconstruction methods use evolutionary distances between species in order to reconstruct the phylogenetic tree spanning them. There are many different methods for estimating distances from sequence data. These methods assume different substitution models and have different statistical properties. Since the true substitution model is typically unknown, it is important to consider the effect of model misspecification on the performance of a distance estimation method.

RESULTS

This paper continues the line of research which attempts to adjust to each given set of input sequences a distance function which maximizes the expected topological accuracy of the reconstructed tree. We focus here on the effect of systematic error caused by assuming an inadequate model, but consider also the stochastic error caused by using short sequences. We introduce a theoretical framework for analyzing both sources of error based on the notion of deviation from additivity, which quantifies the contribution of model misspecification to the estimation error. We demonstrate this framework by studying the behavior of the Jukes-Cantor distance function when applied to data generated according to Kimura's two-parameter model with a transition-transversion bias. We provide both a theoretical derivation for this case, and a detailed simulation study on quartet trees.

CONCLUSIONS

We demonstrate both analytically and experimentally that by deliberately assuming an oversimplified evolutionary model, it is possible to increase the topological accuracy of reconstruction. Our theoretical framework provides new insights into the mechanisms that enables statistically inconsistent reconstruction methods to outperform consistent methods.

摘要

背景

基于距离的系统发育重建方法利用物种间的进化距离来重建跨越这些物种的系统发育树。有许多不同的方法可从序列数据估计距离。这些方法假定了不同的替换模型并具有不同的统计特性。由于真实的替换模型通常是未知的,因此考虑模型误设对距离估计方法性能的影响很重要。

结果

本文延续了一系列研究,试图为每组给定的输入序列调整一个距离函数,以使重建树的预期拓扑准确性最大化。我们在此关注因假定模型不充分而导致的系统误差的影响,但也考虑了使用短序列所导致的随机误差。我们基于偏离可加性的概念引入了一个分析这两种误差来源的理论框架,该框架量化了模型误设对估计误差的贡献。我们通过研究Jukes-Cantor距离函数在应用于根据具有转换-颠换偏差的Kimura双参数模型生成的数据时的行为来展示这个框架。我们为此情况提供了理论推导以及对四重奏树的详细模拟研究。

结论

我们通过分析和实验证明,通过有意假定一个过于简化的进化模型,有可能提高重建的拓扑准确性。我们 的理论框架为使统计上不一致的重建方法优于一致方法的机制提供了新的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c71/3538584/310f77709e46/1748-7188-7-22-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验