Suppr超能文献

MUSCLE:具有高精度和高吞吐量的多序列比对。

MUSCLE: multiple sequence alignment with high accuracy and high throughput.

作者信息

Edgar Robert C

出版信息

Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.

Abstract

We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

摘要

我们介绍了MUSCLE,一个用于创建蛋白质序列多重比对的新计算机程序。该算法的要素包括使用kmer计数进行快速距离估计、使用一种我们称为对数期望分数的新轮廓函数进行渐进比对,以及使用依赖树的受限划分进行优化。在四个参考比对测试集(BAliBASE、SABmark、SMART和一个新的基准测试集PREFAB)上,将MUSCLE的速度和准确性与T-Coffee、MAFFT和CLUSTALW进行了比较。MUSCLE在这些测试集中的每一个上都取得了最高或并列最高的准确性排名。在没有优化的情况下,MUSCLE达到的平均准确性在统计学上与T-Coffee和MAFFT没有区别,并且是测试方法中处理大量序列最快的,在当前台式计算机上7分钟内可比对5000个平均长度为350的序列。MUSCLE程序、源代码和PREFAB测试数据可从http://www.drive5.com/muscle免费获取。

相似文献

1
MUSCLE: multiple sequence alignment with high accuracy and high throughput.
Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.
2
MUSCLE: a multiple sequence alignment method with reduced time and space complexity.
BMC Bioinformatics. 2004 Aug 19;5:113. doi: 10.1186/1471-2105-5-113.
3
MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities.
Bioinformatics. 2010 Aug 15;26(16):1958-64. doi: 10.1093/bioinformatics/btq338. Epub 2010 Jun 23.
5
Mind the gaps: evidence of bias in estimates of multiple sequence alignments.
Mol Biol Evol. 2007 Nov;24(11):2433-42. doi: 10.1093/molbev/msm176. Epub 2007 Aug 20.
6
A knowledge-based multiple-sequence alignment algorithm.
IEEE/ACM Trans Comput Biol Bioinform. 2013 Jul-Aug;10(4):884-96. doi: 10.1109/TCBB.2013.102.
7
Multiple sequence alignment based on profile alignment of intermediate sequences.
J Comput Biol. 2008 Sep;15(7):767-77. doi: 10.1089/cmb.2007.0132.
8
OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy.
BMC Bioinformatics. 2003 Oct 10;4:47. doi: 10.1186/1471-2105-4-47.
9
SATCHMO: sequence alignment and tree construction using hidden Markov models.
Bioinformatics. 2003 Jul 22;19(11):1404-11. doi: 10.1093/bioinformatics/btg158.
10
Kalign--an accurate and fast multiple sequence alignment algorithm.
BMC Bioinformatics. 2005 Dec 12;6:298. doi: 10.1186/1471-2105-6-298.

引用本文的文献

2
Characterization and expression analysis of transcription factors in unveil their critical roles in salt stress resistance.
Front Plant Sci. 2025 Aug 21;16:1592211. doi: 10.3389/fpls.2025.1592211. eCollection 2025.
7
Net rate of lateral gene transfer in marine prokaryoplankton.
ISME J. 2025 Jan 2;19(1). doi: 10.1093/ismejo/wraf159.
8
Identification of the first plant caffeoyl-quinate esterases in .
Front Plant Sci. 2025 Aug 20;16:1632036. doi: 10.3389/fpls.2025.1632036. eCollection 2025.
9
Affinity Maturation and Light-Chain-Mediated Paratope Diversification Anticipate Viral Evolution.
bioRxiv. 2025 Aug 28:2025.08.27.672735. doi: 10.1101/2025.08.27.672735.
10
Complete mitochondrial genome of the firefly Kiesenwetter (Coleoptera, Lampyridae) from Japan and its phylogenetic analyses.
Mitochondrial DNA B Resour. 2025 Sep 2;10(10):909-913. doi: 10.1080/23802359.2025.2554217. eCollection 2025.

本文引用的文献

1
COACH: profile-profile alignment of protein families using hidden Markov models.
Bioinformatics. 2004 May 22;20(8):1309-18. doi: 10.1093/bioinformatics/bth091. Epub 2004 Feb 12.
2
A comparison of scoring functions for protein sequence profile alignment.
Bioinformatics. 2004 May 22;20(8):1301-8. doi: 10.1093/bioinformatics/bth090. Epub 2004 Feb 12.
3
Align-m--a new algorithm for multiple alignment of highly divergent sequences.
Bioinformatics. 2004 Jun 12;20(9):1428-35. doi: 10.1093/bioinformatics/bth116. Epub 2004 Feb 12.
4
Local homology recognition and distance measures in linear time using compressed amino acid alphabets.
Nucleic Acids Res. 2004 Jan 16;32(1):380-5. doi: 10.1093/nar/gkh180. Print 2004.
5
APDB: a novel measure for benchmarking sequence alignment methods without reference alignments.
Bioinformatics. 2003;19 Suppl 1:i215-21. doi: 10.1093/bioinformatics/btg1029.
6
LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.
Genome Res. 2003 Apr;13(4):721-31. doi: 10.1101/gr.926603. Epub 2003 Mar 12.
7
COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance.
J Mol Biol. 2003 Feb 7;326(1):317-36. doi: 10.1016/s0022-2836(02)01371-2.
8
NCBI Reference Sequence project: update and current status.
Nucleic Acids Res. 2003 Jan 1;31(1):34-7. doi: 10.1093/nar/gkg111.
9
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.
Nucleic Acids Res. 2002 Jul 15;30(14):3059-66. doi: 10.1093/nar/gkf436.
10
Recent progress in multiple sequence alignment: a survey.
Pharmacogenomics. 2002 Jan;3(1):131-44. doi: 10.1517/14622416.3.1.131.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验