系统发育基因组分析中嘈杂数据的自动去除。

Automated removal of noisy data in phylogenomic analyses.

机构信息

IASMA Research and Innovation Center, Via E. Mach 1, 38010, San Michele all'Adige, TN, Italy.

出版信息

J Mol Evol. 2010 Dec;71(5-6):319-31. doi: 10.1007/s00239-010-9398-z. Epub 2010 Oct 26.

PMID:20976444

Abstract

Noisy data, especially in combination with misalignment and model misspecification can have an adverse effect on phylogeny reconstruction; however, effective methods to identify such data are few. One particularly important class of noisy data is saturated positions. To avoid potential errors related to saturation in phylogenomic analyses, we present an automated procedure involving the step-wise removal of the most variable positions in a given data set coupled with a stopping criterion derived from correlation analyses of pairwise ML distances calculated from the deleted (saturated) and the remaining (conserved) subsets of the alignment. Through a comparison with existing methods, we demonstrate both the effectiveness of our proposed procedure for identifying noisy data and the effect of the removal of such data using a well-publicized case study involving placental mammals. At the least, our procedure will identify data sets requiring greater data exploration, and we recommend its use to investigate the effect on phylogenetic analyses of removing subsets of variable positions exhibiting weak or no correlation to the rest of the alignment. However, we would argue that this procedure, by identifying and removing noisy data, facilitates the construction of more accurate phylogenies by, for example, ameliorating potential long-branch attraction artefacts.

摘要

嘈杂数据，尤其是与不对齐和模型失配相结合时，可能会对系统发育重建产生不利影响；然而，有效的识别此类数据的方法却很少。嘈杂数据的一个特别重要的类别是饱和位置。为了避免系统发育分析中与饱和相关的潜在错误，我们提出了一种自动程序，涉及逐步删除给定数据集中最可变的位置，并结合源自从删除的（饱和的）和对齐的剩余（保守的）子集计算的成对 ML 距离的相关分析的停止标准。通过与现有方法的比较，我们证明了我们提出的识别嘈杂数据的程序的有效性，以及使用一个广为人知的胎盘哺乳动物案例研究去除这些数据子集的效果。至少，我们的程序将识别需要更大数据探索的数据集，我们建议使用它来研究去除与对齐其余部分相关性较弱或没有相关性的可变位置子集对系统发育分析的影响。然而，我们认为，通过识别和去除嘈杂数据，该程序通过例如减轻潜在的长枝吸引伪影，有助于构建更准确的系统发育。

相似文献

Automated removal of noisy data in phylogenomic analyses.系统发育基因组分析中嘈杂数据的自动去除。

J Mol Evol. 2010 Dec;71(5-6):319-31. doi: 10.1007/s00239-010-9398-z. Epub 2010 Oct 26.

Impact of missing data on phylogenies inferred from empirical phylogenomic data sets.缺失数据对从经验系统发育基因组数据集推断的系统发育的影响。

Mol Biol Evol. 2013 Jan;30(1):197-214. doi: 10.1093/molbev/mss208. Epub 2012 Aug 28.

Removal of noisy characters from chloroplast genome-scale data suggests revision of phylogenetic placements of Amborella and Ceratophyllum.从叶绿体基因组规模数据中去除噪声字符表明需要修订无油樟和金鱼藻的系统发育位置。

J Mol Evol. 2009 Mar;68(3):197-204. doi: 10.1007/s00239-009-9206-9. Epub 2009 Feb 27.

Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis.从多序列比对中选择保守区域用于系统发育分析。

Mol Biol Evol. 2000 Apr;17(4):540-52. doi: 10.1093/oxfordjournals.molbev.a026334.

The phylogenetic position of Myxozoa: exploring conflicting signals in phylogenomic and ribosomal data sets.粘体动物门的系统发育位置：探究基因组和核糖体数据集的冲突信号。

Mol Biol Evol. 2010 Dec;27(12):2733-46. doi: 10.1093/molbev/msq159. Epub 2010 Jun 24.

Testing congruence in phylogenomic analysis.测试系统发育基因组分析中的一致性。

Syst Biol. 2008 Feb;57(1):104-15. doi: 10.1080/10635150801910436.

Fast and accurate branch lengths estimation for phylogenomic trees.系统发育树的快速准确分支长度估计

BMC Bioinformatics. 2016 Jan 7;17:23. doi: 10.1186/s12859-015-0821-8.

An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics.对深层真核生物系统发育基因组学中长枝吸引假象的实证评估。

Syst Biol. 2005 Oct;54(5):743-57. doi: 10.1080/10635150500234609.

Comparing evolutionary distances via adaptive distance functions.通过自适应距离函数比较进化距离。

J Theor Biol. 2018 Mar 7;440:88-99. doi: 10.1016/j.jtbi.2017.12.022. Epub 2017 Dec 23.

Phylogenomic analysis of a rapid radiation of misfit fishes (Syngnathiformes) using ultraconserved elements.利用超保守元件对不匹配鱼类（海龙目）快速辐射进行系统基因组分析。

Mol Phylogenet Evol. 2017 Aug;113:33-48. doi: 10.1016/j.ympev.2017.05.002. Epub 2017 May 6.

引用本文的文献

Phylogenomic data resolved the deep relationships of Gymnogynoideae (Selaginellaceae).系统发育基因组数据解析了卷柏科薄叶卷柏亚科的深层亲缘关系。

Front Plant Sci. 2024 Jul 16;15:1405253. doi: 10.3389/fpls.2024.1405253. eCollection 2024.

A Guide to Phylogenomic Inference.系统发育基因组推断指南。

Methods Mol Biol. 2024;2802:267-345. doi: 10.1007/978-1-0716-3838-5_11.

Mitochondrial phylogenomics reveals deep relationships of scarab beetles (Coleoptera, Scarabaeidae).线粒体系统发生基因组学揭示了蜣螂（鞘翅目，金龟子科）的深层关系。

PLoS One. 2022 Dec 13;17(12):e0278820. doi: 10.1371/journal.pone.0278820. eCollection 2022.

Organelle Phylogenomics and Extensive Conflicting Phylogenetic Signals in the Monocot Order Poales.单子叶植物莎草目细胞器系统发育基因组学及广泛冲突的系统发育信号

Front Plant Sci. 2022 Jan 31;12:824672. doi: 10.3389/fpls.2021.824672. eCollection 2021.

Water lily () genome reveals variable genomic signatures of ancient vascular cambium losses.荷花基因组揭示了古老维管形成层丧失的可变基因组特征。

Proc Natl Acad Sci U S A. 2020 Apr 14;117(15):8649-8656. doi: 10.1073/pnas.1922873117. Epub 2020 Mar 31.

A Novel Test for Absolute Fit of Evolutionary Models Provides a Means to Correctly Identify the Substitution Model and the Model Tree.一种新的进化模型绝对拟合检验方法提供了一种正确识别替代模型和模型树的手段。

Genome Biol Evol. 2019 Aug 1;11(8):2403-2419. doi: 10.1093/gbe/evz167.

Noise and biases in genomic data may underlie radically different hypotheses for the position of Iguania within Squamata.基因组数据中的噪声和偏差可能是导致 Iguania 在 Squamata 中的位置产生根本不同假设的原因。

PLoS One. 2018 Aug 22;13(8):e0202729. doi: 10.1371/journal.pone.0202729. eCollection 2018.

Estimating Improved Partitioning Schemes for Ultraconserved Elements.估算超保守元件的改进分区方案。

Mol Biol Evol. 2018 Jul 1;35(7):1798-1811. doi: 10.1093/molbev/msy069.

Trunk dental tissue evolved independently from underlying dermal bony plates but is associated with surface bones in living odontode-bearing catfish.主干牙齿组织是从下面的真皮骨板独立进化而来的，但与现存有齿硬骨鱼的表面骨骼有关。

Proc Biol Sci. 2017 Oct 25;284(1865). doi: 10.1098/rspb.2017.1831.

Multiple measures could alleviate long-branch attraction in phylogenomic reconstruction of Cupressoideae (Cupressaceae).多种方法可缓解柏科柏木族系统发育重建中的长枝吸引现象。

Sci Rep. 2017 Jan 25;7:41005. doi: 10.1038/srep41005.

本文引用的文献

CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP.系统发育树的置信区间：一种使用自展法的方法。

Evolution. 1985 Jul;39(4):783-791. doi: 10.1111/j.1558-5646.1985.tb00420.x.

Phylogenetic signal in nucleotide data from seed plants: implications for resolving the seed plant tree of life.种子植物核苷酸数据中的系统发育信号：对解析种子植物生命之树的启示。

Am J Bot. 2004 Oct;91(10):1599-613. doi: 10.3732/ajb.91.10.1599.

Phylogenetic-signal dissection of nuclear housekeeping genes supports the paraphyly of sponges and the monophyly of Eumetazoa.核管家基因的系统发育信号剖析支持海绵动物的并系性和真后生动物的单系性。

Mol Biol Evol. 2009 Oct;26(10):2261-74. doi: 10.1093/molbev/msp148. Epub 2009 Jul 13.

Fast genes and slow clades: comparative rates of molecular evolution in mammals.快速基因与缓慢进化枝：哺乳动物分子进化的比较速率。

Evol Bioinform Online. 2007 May 31;3:59-85.

J Mol Evol. 2009 Mar;68(3):197-204. doi: 10.1007/s00239-009-9206-9. Epub 2009 Feb 27.

SlowFaster, a user-friendly program for slow-fast analysis and its application on phylogeny of Blastocystis.SlowFaster，一款用于慢-快分析的用户友好型程序及其在芽囊原虫系统发育中的应用。

BMC Bioinformatics. 2008 Aug 15;9:341. doi: 10.1186/1471-2105-9-341.

Confirming the phylogeny of mammals by use of large comparative sequence data sets.利用大型比较序列数据集确定哺乳动物的系统发育。

Mol Biol Evol. 2008 Sep;25(9):1795-808. doi: 10.1093/molbev/msn104. Epub 2008 May 2.

The adaptive evolution of the mammalian mitochondrial genome.哺乳动物线粒体基因组的适应性进化。

BMC Genomics. 2008 Mar 4;9:119. doi: 10.1186/1471-2164-9-119.

Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms.利用质体基因组规模数据解析基部被子植物间的神秘关系。

Proc Natl Acad Sci U S A. 2007 Dec 4;104(49):19363-8. doi: 10.1073/pnas.0708072104. Epub 2007 Nov 28.

Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns.对64个质体基因组中的81个基因进行分析，解析了被子植物之间的关系，并确定了基因组规模的进化模式。

Proc Natl Acad Sci U S A. 2007 Dec 4;104(49):19369-74. doi: 10.1073/pnas.0709121104. Epub 2007 Nov 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

系统发育基因组分析中嘈杂数据的自动去除。

Automated removal of noisy data in phylogenomic analyses.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献