PASTA：用于核苷酸和氨基酸序列的超大多重序列比对

PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.

作者信息

Mirarab Siavash, Nguyen Nam, Guo Sheng, Wang Li-San, Kim Junhyong, Warnow Tandy

机构信息

1 Department of Computer Science, University of Texas at Austin , Austin, Texas.

出版信息

J Comput Biol. 2015 May;22(5):377-86. doi: 10.1089/cmb.2014.0156. Epub 2014 Dec 30.

DOI:10.1089/cmb.2014.0156

PMID:25549288

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4424971/

Abstract

We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.

摘要

我们介绍了PASTA，一种新的多序列比对算法。PASTA使用一种新技术，在给定引导树的情况下生成比对，这使其既能实现高度可扩展性，又能非常精确。我们对多达20万条序列的生物学数据和模拟数据进行了一项研究，结果表明PASTA生成的比对高度精确，在准确性和可扩展性方面优于领先的比对方法（包括SATé）。我们还表明，基于PASTA比对估计的树非常精确——略优于SATé树，但相对于其他方法有显著改进。最后，PASTA比SATé更快，具有高度可并行性，并且所需内存相对较少。

相似文献

PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.PASTA：用于核苷酸和氨基酸序列的超大多重序列比对

J Comput Biol. 2015 May;22(5):377-86. doi: 10.1089/cmb.2014.0156. Epub 2014 Dec 30.

SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.SATe-II：一种非常快速且准确的同时估计多个序列比对和系统发育树的方法。

Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.

SEPP: SATé-enabled phylogenetic placement.SEPP：基于SATé的系统发育定位

Pac Symp Biocomput. 2012:247-58. doi: 10.1142/9789814366496_0024.

Multiple Sequence Alignment for Large Heterogeneous Datasets Using SATé, PASTA, and UPP.使用SATé、PASTA和UPP对大型异构数据集进行多序列比对。

Methods Mol Biol. 2021;2231:99-119. doi: 10.1007/978-1-0716-1036-7_7.

The effect of the guide tree on multiple sequence alignments and subsequent phylogenetic analyses.引导树对多序列比对及后续系统发育分析的影响。

Pac Symp Biocomput. 2008:25-36. doi: 10.1142/9789812776136_0004.

Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees.序列比对和系统发育树的快速准确大规模联合估计

Science. 2009 Jun 19;324(5934):1561-4. doi: 10.1126/science.1171243.

On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。

Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

Large-scale multiple sequence alignment and tree estimation using SATé.使用SATé进行大规模多序列比对和树估计。

Methods Mol Biol. 2014;1079:219-44. doi: 10.1007/978-1-62703-646-7_15.

PASTA with many application-aware optimization criteria for alignment based phylogeny inference.基于比对的系统发育推断的具有多种应用感知优化标准的 PASTA。

Comput Biol Chem. 2022 Jun;98:107661. doi: 10.1016/j.compbiolchem.2022.107661. Epub 2022 Mar 14.

Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计

BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.

引用本文的文献

Ultrafast and ultralarge multiple sequence alignments using TWILIGHT.使用TWILIGHT进行超快速和超大的多序列比对。

Bioinformatics. 2025 Jul 1;41(Supplement_1):i332-i341. doi: 10.1093/bioinformatics/btaf212.

Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES.使用ROADIES从原始基因组组装中准确、可扩展且完全自动化地推断物种树。

Proc Natl Acad Sci U S A. 2025 May 13;122(19):e2500553122. doi: 10.1073/pnas.2500553122. Epub 2025 May 2.

TIPP3 and TIPP3-fast: Improved abundance profiling in metagenomics.TIPP3和TIPP3-fast：宏基因组学中改进的丰度分析

PLoS Comput Biol. 2025 Apr 4;21(4):e1012593. doi: 10.1371/journal.pcbi.1012593. eCollection 2025 Apr.

: A Model Eukaryotic Organism for Astrobiological Studies on Microbial Interactions with Martian Soil Analogs.一种用于微生物与火星土壤模拟物相互作用的天体生物学研究的模式真核生物。

JACS Au. 2024 Dec 23;5(1):187-203. doi: 10.1021/jacsau.4c00869. eCollection 2025 Jan 27.

The Asgard archaeal origins of Arf family GTPases involved in eukaryotic organelle dynamics.参与真核细胞器动态变化的Arf家族GTP酶的阿斯加德古菌起源。

Nat Microbiol. 2025 Feb;10(2):495-508. doi: 10.1038/s41564-024-01904-6. Epub 2025 Jan 23.

Ubiquitous genome streamlined in freshwater environments.普遍存在的基因组在淡水环境中变得简化。

ISME Commun. 2024 Oct 22;4(1):ycae124. doi: 10.1093/ismeco/ycae124. eCollection 2024 Jan.

Evolutionary Modes of wtf Meiotic Driver Genes in Schizosaccharomyces pombe.酿酒酵母有性生殖驱动基因 wtf 的进化模式。

Genome Biol Evol. 2024 Oct 9;16(10). doi: 10.1093/gbe/evae221.

Global freshwater distribution of Telonemia protists.全球 Telonemia 原生动物的淡水分布。

ISME J. 2024 Jan 8;18(1). doi: 10.1093/ismejo/wrae177.

Leaf transcriptomes from C3, C3-C4 intermediate, and C4Neurachne species give insights into C4 photosynthesis evolution.来自C3、C3-C4中间型和C4类假牛鞭草属物种的叶片转录组为C4光合作用的进化提供了见解。

Plant Physiol. 2024 Dec 23;197(1). doi: 10.1093/plphys/kiae424.

Dynamic evolution of the heterochromatin sensing histone demethylase IBM1.异染色质感应组蛋白去甲基酶 IBM1 的动态进化。

PLoS Genet. 2024 Jul 11;20(7):e1011358. doi: 10.1371/journal.pgen.1011358. eCollection 2024 Jul.

本文引用的文献

Making automated multiple alignments of very large numbers of protein sequences.对大量蛋白质序列进行自动多重比对。

Bioinformatics. 2013 Apr 15;29(8):989-95. doi: 10.1093/bioinformatics/btt093. Epub 2013 Feb 21.

Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.

Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.使用 Clustal Omega 快速、可扩展地生成高质量蛋白质多重序列比对。

Mol Syst Biol. 2011 Oct 11;7:539. doi: 10.1038/msb.2011.75.

FastSP: linear time calculation of alignment accuracy.FastSP：线性时间计算比对准确性。

Bioinformatics. 2011 Dec 1;27(23):3250-8. doi: 10.1093/bioinformatics/btr553. Epub 2011 Oct 7.

HMMER web server: interactive sequence similarity searching.HMMER 网页服务器：交互式序列相似性搜索。

Nucleic Acids Res. 2011 Jul;39(Web Server issue):W29-37. doi: 10.1093/nar/gkr367. Epub 2011 May 18.

A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives.多种序列比对方法的综合基准研究：当前的挑战与未来展望。

PLoS One. 2011 Mar 31;6(3):e18093. doi: 10.1371/journal.pone.0018093.

FastTree 2--approximately maximum-likelihood trees for large alignments.FastTree 2--用于大型比对的近似最大似然树。

PLoS One. 2010 Mar 10;5(3):e9490. doi: 10.1371/journal.pone.0009490.

A new generation of homology search tools based on probabilistic inference.基于概率推理的新一代同源性搜索工具。

Genome Inform. 2009 Oct;23(1):205-11.

Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees.序列比对和系统发育树的快速准确大规模联合估计

Science. 2009 Jun 19;324(5934):1561-4. doi: 10.1126/science.1171243.

INDELible: a flexible simulator of biological sequence evolution.INDELible：一款灵活的生物序列进化模拟器。

Mol Biol Evol. 2009 Aug;26(8):1879-88. doi: 10.1093/molbev/msp098. Epub 2009 May 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。