MUSCLE：具有高精度和高吞吐量的多序列比对。

MUSCLE: multiple sequence alignment with high accuracy and high throughput.

作者信息

Edgar Robert C

出版信息

Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.

DOI:10.1093/nar/gkh340

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC390337/

Abstract

We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

摘要

我们介绍了MUSCLE，一个用于创建蛋白质序列多重比对的新计算机程序。该算法的要素包括使用kmer计数进行快速距离估计、使用一种我们称为对数期望分数的新轮廓函数进行渐进比对，以及使用依赖树的受限划分进行优化。在四个参考比对测试集（BAliBASE、SABmark、SMART和一个新的基准测试集PREFAB）上，将MUSCLE的速度和准确性与T-Coffee、MAFFT和CLUSTALW进行了比较。MUSCLE在这些测试集中的每一个上都取得了最高或并列最高的准确性排名。在没有优化的情况下，MUSCLE达到的平均准确性在统计学上与T-Coffee和MAFFT没有区别，并且是测试方法中处理大量序列最快的，在当前台式计算机上7分钟内可比对5000个平均长度为350的序列。MUSCLE程序、源代码和PREFAB测试数据可从http://www.drive5.com/muscle免费获取。

相似文献

1

MUSCLE: multiple sequence alignment with high accuracy and high throughput.MUSCLE：具有高精度和高吞吐量的多序列比对。

Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.

2

MUSCLE: a multiple sequence alignment method with reduced time and space complexity.MUSCLE：一种时间和空间复杂度降低的多序列比对方法。

BMC Bioinformatics. 2004 Aug 19;5:113. doi: 10.1186/1471-2105-5-113.

3

MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities.MSAProbs：基于对隐马尔可夫模型和分区函数后验概率的多重序列比对。

Bioinformatics. 2010 Aug 15;26(16):1958-64. doi: 10.1093/bioinformatics/btq338. Epub 2010 Jun 23.

4

Improvement in the accuracy of multiple sequence alignment program MAFFT.多重序列比对程序MAFFT准确性的提高。

Genome Inform. 2005;16(1):22-33.

5

Mind the gaps: evidence of bias in estimates of multiple sequence alignments.注意差距：多重序列比对估计中的偏差证据。

Mol Biol Evol. 2007 Nov;24(11):2433-42. doi: 10.1093/molbev/msm176. Epub 2007 Aug 20.

6

A knowledge-based multiple-sequence alignment algorithm.基于知识的多序列比对算法。

IEEE/ACM Trans Comput Biol Bioinform. 2013 Jul-Aug;10(4):884-96. doi: 10.1109/TCBB.2013.102.

7

Multiple sequence alignment based on profile alignment of intermediate sequences.基于中间序列的轮廓比对进行多序列比对。

J Comput Biol. 2008 Sep;15(7):767-77. doi: 10.1089/cmb.2007.0132.

8

OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy.OXBench：一种用于评估蛋白质多序列比对准确性的基准。

BMC Bioinformatics. 2003 Oct 10;4:47. doi: 10.1186/1471-2105-4-47.

9

SATCHMO: sequence alignment and tree construction using hidden Markov models.SATCHMO：使用隐马尔可夫模型进行序列比对和树构建。

Bioinformatics. 2003 Jul 22;19(11):1404-11. doi: 10.1093/bioinformatics/btg158.

10

Kalign--an accurate and fast multiple sequence alignment algorithm.Kalign——一种准确且快速的多序列比对算法。

BMC Bioinformatics. 2005 Dec 12;6:298. doi: 10.1186/1471-2105-6-298.

引用本文的文献

1

Exploration of Comprehensive Structural and Functional Potential of Recombinant Proteins Using Cutting-Edge Bioinformatics Tools.使用前沿生物信息学工具探索重组蛋白的综合结构和功能潜力。

Appl Biochem Biotechnol. 2025 Sep 9. doi: 10.1007/s12010-025-05366-2.

2

Characterization and expression analysis of transcription factors in unveil their critical roles in salt stress resistance.转录因子的表征及表达分析揭示了它们在抗盐胁迫中的关键作用。（注：原英文文本似乎不完整，推测完整意思大概如此）

Front Plant Sci. 2025 Aug 21;16:1592211. doi: 10.3389/fpls.2025.1592211. eCollection 2025.

3

Energy entropy vector: a novel approach for efficient microbial genomic sequence analysis and classification.能量熵向量：一种用于高效微生物基因组序列分析和分类的新方法。

Brief Bioinform. 2025 Sep 6;26(5). doi: 10.1093/bib/bbaf459.

4

A Bradyrhizobium isolate from a marine diatom induces nitrogen-fixing nodules in a terrestrial legume.从一种海洋硅藻中分离出的慢生根瘤菌能在一种陆生豆科植物中诱导形成固氮根瘤。

Nat Microbiol. 2025 Sep 5. doi: 10.1038/s41564-025-02105-5.

5

Atomic resolution structures of the methane-activating enzyme in anaerobic methanotrophy reveal extensive post-translational modifications.厌氧甲烷氧化中甲烷活化酶的原子分辨率结构揭示了广泛的翻译后修饰。

Nat Commun. 2025 Sep 5;16(1):8229. doi: 10.1038/s41467-025-63387-1.

6

Integrative taxonomy uncovers four novel species of Drawida Michaelsen, 1900 (Clitellata: Moniligastridae) revealing untapped earthworm diversity in India.综合分类学揭示了四种新的链胃蚓属（Drawida Michaelsen，1900）物种（寡毛纲：链胃蚓科），揭示了印度尚未开发的蚯蚓多样性。

Mol Biol Rep. 2025 Sep 5;52(1):869. doi: 10.1007/s11033-025-10956-8.

7

Net rate of lateral gene transfer in marine prokaryoplankton.海洋原核浮游生物中横向基因转移的净速率。

ISME J. 2025 Jan 2;19(1). doi: 10.1093/ismejo/wraf159.

8

Identification of the first plant caffeoyl-quinate esterases in .首次鉴定出植物中的咖啡酰奎宁酸酯酶于……

Front Plant Sci. 2025 Aug 20;16:1632036. doi: 10.3389/fpls.2025.1632036. eCollection 2025.

9

Affinity Maturation and Light-Chain-Mediated Paratope Diversification Anticipate Viral Evolution.亲和力成熟和轻链介导的互补决定区多样化可预测病毒进化。

bioRxiv. 2025 Aug 28:2025.08.27.672735. doi: 10.1101/2025.08.27.672735.

10

Complete mitochondrial genome of the firefly Kiesenwetter (Coleoptera, Lampyridae) from Japan and its phylogenetic analyses.来自日本的萤火虫基森维特（鞘翅目，萤科）线粒体全基因组及其系统发育分析。

Mitochondrial DNA B Resour. 2025 Sep 2;10(10):909-913. doi: 10.1080/23802359.2025.2554217. eCollection 2025.

本文引用的文献

1

COACH: profile-profile alignment of protein families using hidden Markov models.COACH：使用隐马尔可夫模型对蛋白质家族进行轮廓-轮廓比对。

Bioinformatics. 2004 May 22;20(8):1309-18. doi: 10.1093/bioinformatics/bth091. Epub 2004 Feb 12.

2

A comparison of scoring functions for protein sequence profile alignment.蛋白质序列谱比对评分函数的比较

Bioinformatics. 2004 May 22;20(8):1301-8. doi: 10.1093/bioinformatics/bth090. Epub 2004 Feb 12.

3

Align-m--a new algorithm for multiple alignment of highly divergent sequences.Align-m——一种用于高度分化序列多重比对的新算法。

Bioinformatics. 2004 Jun 12;20(9):1428-35. doi: 10.1093/bioinformatics/bth116. Epub 2004 Feb 12.

4

Local homology recognition and distance measures in linear time using compressed amino acid alphabets.使用压缩氨基酸字母表在线性时间内进行局部同源性识别和距离测量。

Nucleic Acids Res. 2004 Jan 16;32(1):380-5. doi: 10.1093/nar/gkh180. Print 2004.

5

APDB: a novel measure for benchmarking sequence alignment methods without reference alignments.APDB：一种用于在没有参考比对的情况下对序列比对方法进行基准测试的新方法。

Bioinformatics. 2003;19 Suppl 1:i215-21. doi: 10.1093/bioinformatics/btg1029.

6

LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.LAGAN和多LAGAN：用于基因组DNA大规模多重比对的高效工具。

Genome Res. 2003 Apr;13(4):721-31. doi: 10.1101/gr.926603. Epub 2003 Mar 12.

7

COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance.COMPASS：一种用于比较多个蛋白质序列比对并评估统计学显著性的工具。

J Mol Biol. 2003 Feb 7;326(1):317-36. doi: 10.1016/s0022-2836(02)01371-2.

8

NCBI Reference Sequence project: update and current status.美国国立生物技术信息中心参考序列项目：更新与现状

Nucleic Acids Res. 2003 Jan 1;31(1):34-7. doi: 10.1093/nar/gkg111.

9

MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.MAFFT：一种基于快速傅里叶变换的快速多序列比对新方法。

Nucleic Acids Res. 2002 Jul 15;30(14):3059-66. doi: 10.1093/nar/gkf436.

10

Recent progress in multiple sequence alignment: a survey.多重序列比对的最新进展：一项综述。

Pharmacogenomics. 2002 Jan;3(1):131-44. doi: 10.1517/14622416.3.1.131.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验