Suppr超能文献

系统发育基因组分析中嘈杂数据的自动去除。

Automated removal of noisy data in phylogenomic analyses.

机构信息

IASMA Research and Innovation Center, Via E. Mach 1, 38010, San Michele all'Adige, TN, Italy.

出版信息

J Mol Evol. 2010 Dec;71(5-6):319-31. doi: 10.1007/s00239-010-9398-z. Epub 2010 Oct 26.

Abstract

Noisy data, especially in combination with misalignment and model misspecification can have an adverse effect on phylogeny reconstruction; however, effective methods to identify such data are few. One particularly important class of noisy data is saturated positions. To avoid potential errors related to saturation in phylogenomic analyses, we present an automated procedure involving the step-wise removal of the most variable positions in a given data set coupled with a stopping criterion derived from correlation analyses of pairwise ML distances calculated from the deleted (saturated) and the remaining (conserved) subsets of the alignment. Through a comparison with existing methods, we demonstrate both the effectiveness of our proposed procedure for identifying noisy data and the effect of the removal of such data using a well-publicized case study involving placental mammals. At the least, our procedure will identify data sets requiring greater data exploration, and we recommend its use to investigate the effect on phylogenetic analyses of removing subsets of variable positions exhibiting weak or no correlation to the rest of the alignment. However, we would argue that this procedure, by identifying and removing noisy data, facilitates the construction of more accurate phylogenies by, for example, ameliorating potential long-branch attraction artefacts.

摘要

嘈杂数据,尤其是与不对齐和模型失配相结合时,可能会对系统发育重建产生不利影响;然而,有效的识别此类数据的方法却很少。嘈杂数据的一个特别重要的类别是饱和位置。为了避免系统发育分析中与饱和相关的潜在错误,我们提出了一种自动程序,涉及逐步删除给定数据集中最可变的位置,并结合源自从删除的(饱和的)和对齐的剩余(保守的)子集计算的成对 ML 距离的相关分析的停止标准。通过与现有方法的比较,我们证明了我们提出的识别嘈杂数据的程序的有效性,以及使用一个广为人知的胎盘哺乳动物案例研究去除这些数据子集的效果。至少,我们的程序将识别需要更大数据探索的数据集,我们建议使用它来研究去除与对齐其余部分相关性较弱或没有相关性的可变位置子集对系统发育分析的影响。然而,我们认为,通过识别和去除嘈杂数据,该程序通过例如减轻潜在的长枝吸引伪影,有助于构建更准确的系统发育。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验