系统发育评估揭示了空位中被忽视的树信号。

Phylogenetic assessment of alignments reveals neglected tree signal in gaps.

机构信息

Department of Computer Science, ETH Zurich, Universitaetstr, 6, 8092 Zürich, Switzerland.

出版信息

Genome Biol. 2010;11(4):R37. doi: 10.1186/gb-2010-11-4-r37. Epub 2010 Apr 6.

DOI:10.1186/gb-2010-11-4-r37

PMID:20370897

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2884540/

Abstract

BACKGROUND

The alignment of biological sequences is of chief importance to most evolutionary and comparative genomics studies, yet the two main approaches used to assess alignment accuracy have flaws: reference alignments are derived from the biased sample of proteins with known structure, and simulated data lack realism.

RESULTS

Here, we introduce tree-based tests of alignment accuracy, which not only use large and representative samples of real biological data, but also enable the evaluation of the effect of gap placement on phylogenetic inference. We show that (i) the current belief that consistency-based alignments outperform scoring matrix-based alignments is misguided; (ii) gaps carry substantial phylogenetic signal, but are poorly exploited by most alignment and tree building programs; (iii) even so, excluding gaps and variable regions is detrimental; (iv) disagreement among alignment programs says little about the accuracy of resulting trees.

CONCLUSIONS

This study provides the broad community relying on sequence alignment with important practical recommendations, sets superior standards for assessing alignment accuracy, and paves the way for the development of phylogenetic inference methods of significantly higher resolution.

摘要

背景

生物序列的比对对大多数进化和比较基因组学研究至关重要，但用于评估比对准确性的两种主要方法都存在缺陷：参考比对是从具有已知结构的蛋白质的有偏差的样本中得出的，而模拟数据缺乏现实性。

结果

在这里，我们介绍了基于树的比对准确性测试，该测试不仅使用了大量具有代表性的真实生物数据样本，而且还能够评估空位放置对系统发育推断的影响。我们表明：（i）目前认为基于一致性的比对优于基于评分矩阵的比对的观点是错误的；（ii）空位携带大量系统发育信号，但大多数比对和建树程序都未能很好地利用这些信号；（iii）即便如此，排除空位和可变区是有害的；（iv）不同的比对程序之间的分歧并不能说明生成的树的准确性。

结论

这项研究为依赖序列比对的广大社区提供了重要的实用建议，为评估比对准确性设定了更高的标准，并为开发分辨率显著提高的系统发育推断方法铺平了道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c69/2884540/715b4d250836/gb-2010-11-4-r37-1.jpg

相似文献

Phylogenetic assessment of alignments reveals neglected tree signal in gaps.

Genome Biol. 2010;11(4):R37. doi: 10.1186/gb-2010-11-4-r37. Epub 2010 Apr 6.

SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.

Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.

Using ESTs for phylogenomics: can one accurately infer a phylogenetic tree from a gappy alignment?

BMC Evol Biol. 2008 Mar 26;8:95. doi: 10.1186/1471-2148-8-95.

Multiple sequence alignment: in pursuit of homologous DNA positions.

Genome Res. 2007 Feb;17(2):127-35. doi: 10.1101/gr.5232407.

On the quality of tree-based protein classification.

Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments.

Syst Biol. 2007 Aug;56(4):564-77. doi: 10.1080/10635150701472164.

Systematic exploration of guide-tree topology effects for small protein alignments.

BMC Bioinformatics. 2014 Oct 4;15(1):338. doi: 10.1186/1471-2105-15-338.

Phylogenetic inference under varying proportions of indel-induced alignment gaps.

BMC Evol Biol. 2009 Aug 23;9:211. doi: 10.1186/1471-2148-9-211.

The tree alignment problem.

BMC Bioinformatics. 2012 Nov 9;13:293. doi: 10.1186/1471-2105-13-293.

Visualizing phylogenetic tree landscapes.

BMC Bioinformatics. 2017 Feb 2;18(1):85. doi: 10.1186/s12859-017-1479-1.

引用本文的文献

Two new species of freshwater planarian from Hainan Island and Leizhou Peninsula, southern China (Platyhelminthes, Tricladida, Dugesiidae).

Zookeys. 2025 Apr 1;1233:289-313. doi: 10.3897/zookeys.1233.142976. eCollection 2025.

Exploring SNP filtering strategies: the influence of strict vs soft core.

Microb Genom. 2025 Jan;11(1). doi: 10.1099/mgen.0.001346.

Single-character insertion-deletion model preserves long indels in ancestral sequence reconstruction.

BMC Bioinformatics. 2024 Dec 2;25(1):370. doi: 10.1186/s12859-024-05986-1.

Insertions and Deletions: Computational Methods, Evolutionary Dynamics, and Biological Applications.

Mol Biol Evol. 2024 Sep 4;41(9). doi: 10.1093/molbev/msae177.

Optimal phylogenetic reconstruction of insertion and deletion events.

Bioinformatics. 2024 Jun 28;40(Suppl 1):i277-i286. doi: 10.1093/bioinformatics/btae254.

Please Mind the Gap: Indel-Aware Parsimony for Fast and Accurate Ancestral Sequence Reconstruction and Multiple Sequence Alignment Including Long Indels.

Mol Biol Evol. 2024 Jul 3;41(7). doi: 10.1093/molbev/msae109.

Phylogenomics of the Ecdysteroid Kinase-like (EcKL) Gene Family in Insects Highlights Roles in Both Steroid Hormone Metabolism and Detoxification.

Genome Biol Evol. 2024 Feb 1;16(2). doi: 10.1093/gbe/evae019.

fam. nov. (, ) for gen. et sp. nov. and Three New Combinations.

J Fungi (Basel). 2023 Dec 28;10(1):22. doi: 10.3390/jof10010022.

The difficulty of aligning intrinsically disordered protein sequences as assessed by conservation and phylogeny.

PLoS One. 2023 Jul 13;18(7):e0288388. doi: 10.1371/journal.pone.0288388. eCollection 2023.

The GEN-ERA toolbox: unified and reproducible workflows for research in microbial genomics.

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad022. Epub 2023 Apr 10.

本文引用的文献

The impact of multiple protein sequence alignment on phylogenetic estimation.

IEEE/ACM Trans Comput Biol Bioinform. 2011 Jul-Aug;8(4):1108-19. doi: 10.1109/TCBB.2009.68.

Phylogenetic inference under varying proportions of indel-induced alignment gaps.

BMC Evol Biol. 2009 Aug 23;9:211. doi: 10.1186/1471-2148-9-211.

Biological sequence simulation for testing complex evolutionary hypotheses: indel-Seq-Gen version 2.0.

Mol Biol Evol. 2009 Nov;26(11):2581-93. doi: 10.1093/molbev/msp174. Epub 2009 Aug 3.

Upcoming challenges for multiple sequence alignment methods in the high-throughput era.

Bioinformatics. 2009 Oct 1;25(19):2455-65. doi: 10.1093/bioinformatics/btp452. Epub 2009 Jul 30.

Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees.

Science. 2009 Jun 19;324(5934):1561-4. doi: 10.1126/science.1171243.

Phylogenetic and functional assessment of orthologs inference projects and methods.

PLoS Comput Biol. 2009 Jan;5(1):e1000262. doi: 10.1371/journal.pcbi.1000262. Epub 2009 Jan 16.

Algorithm of OMA for large-scale orthology inference.

BMC Bioinformatics. 2008 Dec 4;9:518. doi: 10.1186/1471-2105-9-518.

Characterization of pairwise and multiple sequence alignment errors.

Gene. 2009 Jul 15;441(1-2):141-7. doi: 10.1016/j.gene.2008.05.016. Epub 2008 Jun 3.

Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis.

Science. 2008 Jun 20;320(5883):1632-5. doi: 10.1126/science.1158395.

DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment.

Algorithms Mol Biol. 2008 May 27;3:6. doi: 10.1186/1748-7188-3-6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

系统发育评估揭示了空位中被忽视的树信号。

Phylogenetic assessment of alignments reveals neglected tree signal in gaps.

机构信息

Department of Computer Science, ETH Zurich, Universitaetstr, 6, 8092 Zürich, Switzerland.