Kalign——一种准确且快速的多序列比对算法。

Kalign--an accurate and fast multiple sequence alignment algorithm.

作者信息

Lassmann Timo, Sonnhammer Erik L L

机构信息

Center for Genomics and Bioinformatics, Karolinska Institutet, Berzelius vag 35, S-17177 Stockholm, Sweden.

出版信息

BMC Bioinformatics. 2005 Dec 12;6:298. doi: 10.1186/1471-2105-6-298.

DOI:10.1186/1471-2105-6-298

PMID:16343337

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1325270/

Abstract

BACKGROUND

The alignment of multiple protein sequences is a fundamental step in the analysis of biological data. It has traditionally been applied to analyzing protein families for conserved motifs, phylogeny, structural properties, and to improve sensitivity in homology searching. The availability of complete genome sequences has increased the demands on multiple sequence alignment (MSA) programs. Current MSA methods suffer from being either too inaccurate or too computationally expensive to be applied effectively in large-scale comparative genomics.

RESULTS

We developed Kalign, a method employing the Wu-Manber string-matching algorithm, to improve both the accuracy and speed of multiple sequence alignment. We compared the speed and accuracy of Kalign to other popular methods using Balibase, Prefab, and a new large test set. Kalign was as accurate as the best other methods on small alignments, but significantly more accurate when aligning large and distantly related sets of sequences. In our comparisons, Kalign was about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than popular iterative methods.

CONCLUSION

Kalign is a fast and robust alignment method. It is especially well suited for the increasingly important task of aligning large numbers of sequences.

摘要

背景

多条蛋白质序列的比对是生物数据分析中的一个基本步骤。传统上，它被用于分析蛋白质家族的保守基序、系统发育、结构特性，以及提高同源性搜索的灵敏度。完整基因组序列的可得性增加了对多序列比对（MSA）程序的需求。当前的MSA方法存在要么不准确，要么计算成本过高，无法有效地应用于大规模比较基因组学的问题。

结果

我们开发了Kalign，一种采用Wu-Manber字符串匹配算法的方法，以提高多序列比对的准确性和速度。我们使用Balibase、Prefab和一个新的大型测试集，将Kalign的速度和准确性与其他流行方法进行了比较。在小比对中，Kalign与其他最佳方法一样准确，但在比对大型和远缘相关的序列集时，准确性明显更高。在我们的比较中，Kalign比ClustalW快约10倍，并且根据比对大小，比流行的迭代方法快达50倍。

结论

Kalign是一种快速且稳健的比对方法。它特别适合于比对大量序列这一日益重要的任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d12/1325270/ac6b38e94b77/1471-2105-6-298-1.jpg

相似文献

Kalign--an accurate and fast multiple sequence alignment algorithm.

BMC Bioinformatics. 2005 Dec 12;6:298. doi: 10.1186/1471-2105-6-298.

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.

BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.

Grammar-based distance in progressive multiple sequence alignment.

BMC Bioinformatics. 2008 Jul 10;9:306. doi: 10.1186/1471-2105-9-306.

TM-Aligner: Multiple sequence alignment tool for transmembrane proteins with reduced time and improved accuracy.

Sci Rep. 2017 Oct 2;7(1):12543. doi: 10.1038/s41598-017-13083-y.

An improved scoring method for protein residue conservation and multiple sequence alignment.

IEEE Trans Nanobioscience. 2011 Dec;10(4):275-85. doi: 10.1109/TNB.2011.2179553.

PROMALS: towards accurate multiple sequence alignments of distantly related proteins.

Bioinformatics. 2007 Apr 1;23(7):802-8. doi: 10.1093/bioinformatics/btm017. Epub 2007 Jan 31.

Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost.

BMC Bioinformatics. 2006 Dec 1;7:524. doi: 10.1186/1471-2105-7-524.

A new progressive-iterative algorithm for multiple structure alignment.

Bioinformatics. 2005 Aug 1;21(15):3255-63. doi: 10.1093/bioinformatics/bti527. Epub 2005 Jun 7.

DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment.

BMC Bioinformatics. 2005 Mar 22;6:66. doi: 10.1186/1471-2105-6-66.

MUSTANG: a multiple structural alignment algorithm.

Proteins. 2006 Aug 15;64(3):559-74. doi: 10.1002/prot.20921.

引用本文的文献

Exploring the Biosynthetic Potential of Microorganisms from the South China Sea Cold Seep Using Culture-Dependent and Culture-Independent Approaches.

Mar Drugs. 2025 Jul 30;23(8):313. doi: 10.3390/md23080313.

A chromosome-level genome assembly of the Hispid cotton rat (Sigmodon hispidus), a model for human pathogenic virus infections.

BMC Biol. 2025 Jul 18;23(1):217. doi: 10.1186/s12915-025-02316-6.

Genomic Characterization and Pathogenicity of a Novel Birnavirus Strain Isolated from Mandarin Fish ().

Genes (Basel). 2025 May 24;16(6):629. doi: 10.3390/genes16060629.

Disruption of SETD3-mediated histidine-73 methylation by the BWCFF-associated β-actin G74S mutation.

FEBS Lett. 2025 Sep;599(17):2449-2462. doi: 10.1002/1873-3468.70088. Epub 2025 Jun 9.

Unveiling the multifaceted domain polymorphism of the Menshen antiphage system.

Nucleic Acids Res. 2025 May 10;53(9). doi: 10.1093/nar/gkaf357.

Impact of Alignments on the Accuracy of Protein Subcellular Localization Predictions.

Proteins. 2025 Mar;93(3):745-759. doi: 10.1002/prot.26767. Epub 2024 Nov 22.

TranscriptDB: a transcript-centric database to study eukaryotic transcript conservation and evolution.

Nucleic Acids Res. 2025 Jan 6;53(D1):D1235-D1242. doi: 10.1093/nar/gkae995.

Chromosome level assemblies of Nakaseomyces (Candida) bracarensis uncover two distinct clades and define its adhesin repertoire.

BMC Genomics. 2024 Nov 7;25(1):1053. doi: 10.1186/s12864-024-10979-8.

Human selenocysteine synthase, SEPSECS, has evolved to optimize binding of a tRNA-based substrate.

Nucleic Acids Res. 2024 Nov 27;52(21):13368-13385. doi: 10.1093/nar/gkae875.

SARS-CoV-2 Genotyping Highlights the Challenges in Spike Protein Drift Independent of Other Essential Proteins.

Microorganisms. 2024 Sep 9;12(9):1863. doi: 10.3390/microorganisms12091863.

本文引用的文献

MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.

Phylogenomic inference of protein molecular function: advances and challenges.

Bioinformatics. 2004 Jan 22;20(2):170-9. doi: 10.1093/bioinformatics/bth021.

The Pfam protein families database.

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D138-41. doi: 10.1093/nar/gkh121.

Quality assessment of multiple alignment programs.

FEBS Lett. 2002 Oct 2;529(1):126-30. doi: 10.1016/s0014-5793(02)03189-7.

MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.

Nucleic Acids Res. 2002 Jul 15;30(14):3059-66. doi: 10.1093/nar/gkf436.

Recent progress in multiple sequence alignment: a survey.

Pharmacogenomics. 2002 Jan;3(1):131-44. doi: 10.1517/14622416.3.1.131.

Multiple sequence alignment using partial order graphs.

Bioinformatics. 2002 Mar;18(3):452-64. doi: 10.1093/bioinformatics/18.3.452.

Evaluation of protein multiple alignments by SAM-T99 using the BAliBASE multiple alignment test set.

Bioinformatics. 2001 Aug;17(8):713-20. doi: 10.1093/bioinformatics/17.8.713.

Multiple alignment of complete sequences (MACS) in the post-genomic era.

Gene. 2001 May 30;270(1-2):17-30. doi: 10.1016/s0378-1119(01)00461-9.

BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations.

Nucleic Acids Res. 2001 Jan 1;29(1):323-6. doi: 10.1093/nar/29.1.323.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Kalign——一种准确且快速的多序列比对算法。

Kalign--an accurate and fast multiple sequence alignment algorithm.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献