Suppr超能文献

PASTA:用于核苷酸和氨基酸序列的超大多重序列比对

PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.

作者信息

Mirarab Siavash, Nguyen Nam, Guo Sheng, Wang Li-San, Kim Junhyong, Warnow Tandy

机构信息

1 Department of Computer Science, University of Texas at Austin , Austin, Texas.

出版信息

J Comput Biol. 2015 May;22(5):377-86. doi: 10.1089/cmb.2014.0156. Epub 2014 Dec 30.

Abstract

We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.

摘要

我们介绍了PASTA,一种新的多序列比对算法。PASTA使用一种新技术,在给定引导树的情况下生成比对,这使其既能实现高度可扩展性,又能非常精确。我们对多达20万条序列的生物学数据和模拟数据进行了一项研究,结果表明PASTA生成的比对高度精确,在准确性和可扩展性方面优于领先的比对方法(包括SATé)。我们还表明,基于PASTA比对估计的树非常精确——略优于SATé树,但相对于其他方法有显著改进。最后,PASTA比SATé更快,具有高度可并行性,并且所需内存相对较少。

相似文献

3
SEPP: SATé-enabled phylogenetic placement.SEPP:基于SATé的系统发育定位
Pac Symp Biocomput. 2012:247-58. doi: 10.1142/9789814366496_0024.
7
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

引用本文的文献

1
Ultrafast and ultralarge multiple sequence alignments using TWILIGHT.使用TWILIGHT进行超快速和超大的多序列比对。
Bioinformatics. 2025 Jul 1;41(Supplement_1):i332-i341. doi: 10.1093/bioinformatics/btaf212.
3
TIPP3 and TIPP3-fast: Improved abundance profiling in metagenomics.TIPP3和TIPP3-fast:宏基因组学中改进的丰度分析
PLoS Comput Biol. 2025 Apr 4;21(4):e1012593. doi: 10.1371/journal.pcbi.1012593. eCollection 2025 Apr.
6
Ubiquitous genome streamlined in freshwater environments.普遍存在的基因组在淡水环境中变得简化。
ISME Commun. 2024 Oct 22;4(1):ycae124. doi: 10.1093/ismeco/ycae124. eCollection 2024 Jan.
10
Dynamic evolution of the heterochromatin sensing histone demethylase IBM1.异染色质感应组蛋白去甲基酶 IBM1 的动态进化。
PLoS Genet. 2024 Jul 11;20(7):e1011358. doi: 10.1371/journal.pgen.1011358. eCollection 2024 Jul.

本文引用的文献

1
Making automated multiple alignments of very large numbers of protein sequences.对大量蛋白质序列进行自动多重比对。
Bioinformatics. 2013 Apr 15;29(8):989-95. doi: 10.1093/bioinformatics/btt093. Epub 2013 Feb 21.
4
FastSP: linear time calculation of alignment accuracy.FastSP:线性时间计算比对准确性。
Bioinformatics. 2011 Dec 1;27(23):3250-8. doi: 10.1093/bioinformatics/btr553. Epub 2011 Oct 7.
5
HMMER web server: interactive sequence similarity searching.HMMER 网页服务器:交互式序列相似性搜索。
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W29-37. doi: 10.1093/nar/gkr367. Epub 2011 May 18.
10
INDELible: a flexible simulator of biological sequence evolution.INDELible:一款灵活的生物序列进化模拟器。
Mol Biol Evol. 2009 Aug;26(8):1879-88. doi: 10.1093/molbev/msp098. Epub 2009 May 7.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验