SATe-II：一种非常快速且准确的同时估计多个序列比对和系统发育树的方法。

Department of Computer Science, University of Texas at Austin, Austin, TX 78712, USA.

Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.

Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-II-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of those sequences that maximize likelihood under the Jukes-Cantor model is uninformative in the worst possible sense. For all inputs, all trees optimize the likelihood score. Second, we show that a greedy heuristic that uses GTR+Gamma ML to optimize the alignment and the tree can produce very poor alignments and trees. Therefore, the excellent performance of SATé-II and SATé-I is not because ML is used as an optimization criterion for choosing the best tree/alignment pair but rather due to the particular divide-and-conquer realignment techniques employed.

对于大型数据集，准确估计系统发育树非常困难，部分原因是只有在多序列比对准确的情况下，系统发育估计方法才准确。已经尝试了对齐和树的共同估计，但目前只有 SATé 在实际时间范围内合理地估计了大型数据集的准确树和对齐（Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561-1564）。在这里，我们对原始 SATé 算法进行了修改，使其在速度、系统发育和对齐准确性方面优于 SATé（我们现在称之为 SATé-I）。SATé-II 使用与 SATé-I 不同的分而治之策略，因此产生的子集更小、更相关；结果，SATé-II 生成了更准确的对齐和树，可以分析更大的数据集，并且比 SATé-I 更高效。通常，SATé 是一种元方法，它将现有的多序列比对方法作为输入参数，并提高该比对方法的质量。SATé-II 增强的比对方法比其未增强的版本准确得多，基于这些改进的比对的树比基于原始比对的树更准确。由于 SATé-I 使用最大似然（ML）方法将空位视为缺失数据来估计树，并且我们发现树/比对的质量与 ML 得分之间存在相关性，因此我们探讨了 SATé 的性能在多大程度上取决于使用 ML 将空位视为缺失数据来确定最佳的树/比对。我们提出了两条证据表明，使用 ML 将空位视为缺失数据来优化对齐和树会产生非常差的结果。首先，我们表明，对于一组未对齐的 DNA 序列，并且输出是在 Jukes-Cantor 模型下最大化似然的那些序列的树和对齐的优化问题在最坏的意义上是无信息的。对于所有输入，所有树都优化似然得分。其次，我们表明，使用 GTR+Gamma ML 来优化对齐和树的贪婪启发式方法可以产生非常差的对齐和树。因此，SATé-II 和 SATé-I 的出色表现并不是因为 ML 被用作选择最佳树/比对的优化标准，而是由于使用了特定的分而治之重排技术。

相似文献

SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.

Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.

Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees.

Science. 2009 Jun 19;324(5934):1561-4. doi: 10.1126/science.1171243.

Ancestral sequence alignment under optimal conditions.

BMC Bioinformatics. 2005 Nov 17;6:273. doi: 10.1186/1471-2105-6-273.

On the quality of tree-based protein classification.

Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

Bayesian coestimation of phylogeny and sequence alignment.

BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.

DACTAL: divide-and-conquer trees (almost) without alignments.

Bioinformatics. 2012 Jun 15;28(12):i274-82. doi: 10.1093/bioinformatics/bts218.

Phylogenetic inference from conserved sites alignments.

J Exp Zool. 1999 Aug 15;285(2):128-39.

New approaches to phylogenetic tree search and their application to large numbers of protein alignments.

Syst Biol. 2007 Oct;56(5):727-40. doi: 10.1080/10635150701611134.

Exploring the relationship between sequence similarity and accurate phylogenetic trees.

Mol Biol Evol. 2006 Nov;23(11):2090-100. doi: 10.1093/molbev/msl080. Epub 2006 Aug 4.

SuperFine: fast and accurate supertree estimation.

Syst Biol. 2012 Mar;61(2):214-27. doi: 10.1093/sysbio/syr092. Epub 2011 Sep 20.

引用本文的文献

Ultrafast and ultralarge multiple sequence alignments using TWILIGHT.

Bioinformatics. 2025 Jul 1;41(Supplement_1):i332-i341. doi: 10.1093/bioinformatics/btaf212.

Comparative diversification analyses of Hydrangeaceae and Loasaceae reveal complex evolutionary history as species disperse out of Mesoamerica.

Am J Bot. 2025 Jan;112(1):e16455. doi: 10.1002/ajb2.16455. Epub 2025 Jan 11.

Scaling DEPP phylogenetic placement to ultra-large reference trees: a tree-aware ensemble approach.

Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae361.

Redescription, molecular characterisation and endosymbionts of () (Mullin & Orihel, 1972) (Spirurida: Onchocercidae) from the common treeshrew Diard & Duvaucel (Mammalia: Scandentia) in Peninsular Malaysia.

Curr Res Parasitol Vector Borne Dis. 2023 Nov 23;5:100154. doi: 10.1016/j.crpvbd.2023.100154. eCollection 2024.

EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment.

Algorithms Mol Biol. 2023 Dec 7;18(1):21. doi: 10.1186/s13015-023-00247-x.

System-wide mapping of peptide-GPCR interactions in C. elegans.

Cell Rep. 2023 Sep 26;42(9):113058. doi: 10.1016/j.celrep.2023.113058. Epub 2023 Aug 31.

Leveraging protein language models for accurate multiple sequence alignments.

Genome Res. 2023 Jul;33(7):1145-1153. doi: 10.1101/gr.277675.123. Epub 2023 Jul 6.

Abalign: a comprehensive multiple sequence alignment platform for B-cell receptor immune repertoires.

Nucleic Acids Res. 2023 Jul 5;51(W1):W17-W24. doi: 10.1093/nar/gkad400.

Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage.

BMC Bioinformatics. 2023 Mar 23;24(1):111. doi: 10.1186/s12859-023-05237-9.

Roadmap to the study of gene and protein phylogeny and evolution-A practical guide.

PLoS One. 2023 Feb 24;18(2):e0279597. doi: 10.1371/journal.pone.0279597. eCollection 2023.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.

Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.

Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees.

Science. 2009 Jun 19;324(5934):1561-4. doi: 10.1126/science.1171243.

Ancestral sequence alignment under optimal conditions.

BMC Bioinformatics. 2005 Nov 17;6:273. doi: 10.1186/1471-2105-6-273.

On the quality of tree-based protein classification.

Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

Bayesian coestimation of phylogeny and sequence alignment.

BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.

DACTAL: divide-and-conquer trees (almost) without alignments.

Bioinformatics. 2012 Jun 15;28(12):i274-82. doi: 10.1093/bioinformatics/bts218.

Phylogenetic inference from conserved sites alignments.

J Exp Zool. 1999 Aug 15;285(2):128-39.

New approaches to phylogenetic tree search and their application to large numbers of protein alignments.

Syst Biol. 2007 Oct;56(5):727-40. doi: 10.1080/10635150701611134.

Exploring the relationship between sequence similarity and accurate phylogenetic trees.

Mol Biol Evol. 2006 Nov;23(11):2090-100. doi: 10.1093/molbev/msl080. Epub 2006 Aug 4.

SuperFine: fast and accurate supertree estimation.

Syst Biol. 2012 Mar;61(2):214-27. doi: 10.1093/sysbio/syr092. Epub 2011 Sep 20.

引用本文的文献

Ultrafast and ultralarge multiple sequence alignments using TWILIGHT.

Bioinformatics. 2025 Jul 1;41(Supplement_1):i332-i341. doi: 10.1093/bioinformatics/btaf212.

Comparative diversification analyses of Hydrangeaceae and Loasaceae reveal complex evolutionary history as species disperse out of Mesoamerica.

Am J Bot. 2025 Jan;112(1):e16455. doi: 10.1002/ajb2.16455. Epub 2025 Jan 11.

Scaling DEPP phylogenetic placement to ultra-large reference trees: a tree-aware ensemble approach.

Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae361.

Curr Res Parasitol Vector Borne Dis. 2023 Nov 23;5:100154. doi: 10.1016/j.crpvbd.2023.100154. eCollection 2024.

EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment.

Algorithms Mol Biol. 2023 Dec 7;18(1):21. doi: 10.1186/s13015-023-00247-x.

System-wide mapping of peptide-GPCR interactions in C. elegans.

Cell Rep. 2023 Sep 26;42(9):113058. doi: 10.1016/j.celrep.2023.113058. Epub 2023 Aug 31.

Leveraging protein language models for accurate multiple sequence alignments.

Genome Res. 2023 Jul;33(7):1145-1153. doi: 10.1101/gr.277675.123. Epub 2023 Jul 6.

Abalign: a comprehensive multiple sequence alignment platform for B-cell receptor immune repertoires.

Nucleic Acids Res. 2023 Jul 5;51(W1):W17-W24. doi: 10.1093/nar/gkad400.

Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage.

BMC Bioinformatics. 2023 Mar 23;24(1):111. doi: 10.1186/s12859-023-05237-9.

Roadmap to the study of gene and protein phylogeny and evolution-A practical guide.

PLoS One. 2023 Feb 24;18(2):e0279597. doi: 10.1371/journal.pone.0279597. eCollection 2023.

SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献