• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Kalign——一种准确且快速的多序列比对算法。

Kalign--an accurate and fast multiple sequence alignment algorithm.

作者信息

Lassmann Timo, Sonnhammer Erik L L

机构信息

Center for Genomics and Bioinformatics, Karolinska Institutet, Berzelius vag 35, S-17177 Stockholm, Sweden.

出版信息

BMC Bioinformatics. 2005 Dec 12;6:298. doi: 10.1186/1471-2105-6-298.

DOI:10.1186/1471-2105-6-298
PMID:16343337
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1325270/
Abstract

BACKGROUND

The alignment of multiple protein sequences is a fundamental step in the analysis of biological data. It has traditionally been applied to analyzing protein families for conserved motifs, phylogeny, structural properties, and to improve sensitivity in homology searching. The availability of complete genome sequences has increased the demands on multiple sequence alignment (MSA) programs. Current MSA methods suffer from being either too inaccurate or too computationally expensive to be applied effectively in large-scale comparative genomics.

RESULTS

We developed Kalign, a method employing the Wu-Manber string-matching algorithm, to improve both the accuracy and speed of multiple sequence alignment. We compared the speed and accuracy of Kalign to other popular methods using Balibase, Prefab, and a new large test set. Kalign was as accurate as the best other methods on small alignments, but significantly more accurate when aligning large and distantly related sets of sequences. In our comparisons, Kalign was about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than popular iterative methods.

CONCLUSION

Kalign is a fast and robust alignment method. It is especially well suited for the increasingly important task of aligning large numbers of sequences.

摘要

背景

多条蛋白质序列的比对是生物数据分析中的一个基本步骤。传统上,它被用于分析蛋白质家族的保守基序、系统发育、结构特性,以及提高同源性搜索的灵敏度。完整基因组序列的可得性增加了对多序列比对(MSA)程序的需求。当前的MSA方法存在要么不准确,要么计算成本过高,无法有效地应用于大规模比较基因组学的问题。

结果

我们开发了Kalign,一种采用Wu-Manber字符串匹配算法的方法,以提高多序列比对的准确性和速度。我们使用Balibase、Prefab和一个新的大型测试集,将Kalign的速度和准确性与其他流行方法进行了比较。在小比对中,Kalign与其他最佳方法一样准确,但在比对大型和远缘相关的序列集时,准确性明显更高。在我们的比较中,Kalign比ClustalW快约10倍,并且根据比对大小,比流行的迭代方法快达50倍。

结论

Kalign是一种快速且稳健的比对方法。它特别适合于比对大量序列这一日益重要的任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d12/1325270/b3ceb5fb5512/1471-2105-6-298-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d12/1325270/ac6b38e94b77/1471-2105-6-298-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d12/1325270/268e22f3bae4/1471-2105-6-298-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d12/1325270/6c4d8cc2f279/1471-2105-6-298-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d12/1325270/337ed26ee61d/1471-2105-6-298-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d12/1325270/b3ceb5fb5512/1471-2105-6-298-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d12/1325270/ac6b38e94b77/1471-2105-6-298-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d12/1325270/268e22f3bae4/1471-2105-6-298-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d12/1325270/6c4d8cc2f279/1471-2105-6-298-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d12/1325270/337ed26ee61d/1471-2105-6-298-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d12/1325270/b3ceb5fb5512/1471-2105-6-298-5.jpg

相似文献

1
Kalign--an accurate and fast multiple sequence alignment algorithm.Kalign——一种准确且快速的多序列比对算法。
BMC Bioinformatics. 2005 Dec 12;6:298. doi: 10.1186/1471-2105-6-298.
2
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.transAlign:利用氨基酸促进蛋白质编码DNA序列的多重比对。
BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.
3
Grammar-based distance in progressive multiple sequence alignment.渐进多序列比对中基于语法的距离
BMC Bioinformatics. 2008 Jul 10;9:306. doi: 10.1186/1471-2105-9-306.
4
TM-Aligner: Multiple sequence alignment tool for transmembrane proteins with reduced time and improved accuracy.TM-Aligner:用于跨膜蛋白的多重序列比对工具,可减少时间并提高准确性。
Sci Rep. 2017 Oct 2;7(1):12543. doi: 10.1038/s41598-017-13083-y.
5
An improved scoring method for protein residue conservation and multiple sequence alignment.一种改进的蛋白质残基保守性评分方法及其在多序列比对中的应用。
IEEE Trans Nanobioscience. 2011 Dec;10(4):275-85. doi: 10.1109/TNB.2011.2179553.
6
PROMALS: towards accurate multiple sequence alignments of distantly related proteins.PROMALS:用于实现远缘相关蛋白质准确多序列比对
Bioinformatics. 2007 Apr 1;23(7):802-8. doi: 10.1093/bioinformatics/btm017. Epub 2007 Jan 31.
7
Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost.使用具有分段线性间隙成本的新型组对组序列比对算法提高多序列比对的准确性。
BMC Bioinformatics. 2006 Dec 1;7:524. doi: 10.1186/1471-2105-7-524.
8
A new progressive-iterative algorithm for multiple structure alignment.一种用于多结构比对的新型渐进迭代算法。
Bioinformatics. 2005 Aug 1;21(15):3255-63. doi: 10.1093/bioinformatics/bti527. Epub 2005 Jun 7.
9
DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment.DIALIGN-T:一种改进的基于片段的多序列比对算法。
BMC Bioinformatics. 2005 Mar 22;6:66. doi: 10.1186/1471-2105-6-66.
10
MUSTANG: a multiple structural alignment algorithm.MUSTANG:一种多重结构比对算法。
Proteins. 2006 Aug 15;64(3):559-74. doi: 10.1002/prot.20921.

引用本文的文献

1
Exploring the Biosynthetic Potential of Microorganisms from the South China Sea Cold Seep Using Culture-Dependent and Culture-Independent Approaches.运用依赖培养和不依赖培养的方法探索中国南海冷泉微生物的生物合成潜力。
Mar Drugs. 2025 Jul 30;23(8):313. doi: 10.3390/md23080313.
2
A chromosome-level genome assembly of the Hispid cotton rat (Sigmodon hispidus), a model for human pathogenic virus infections.棉鼠(Sigmodon hispidus)的染色体水平基因组组装,棉鼠是人类致病病毒感染的模型。
BMC Biol. 2025 Jul 18;23(1):217. doi: 10.1186/s12915-025-02316-6.
3
Genomic Characterization and Pathogenicity of a Novel Birnavirus Strain Isolated from Mandarin Fish ().

本文引用的文献

1
MUSCLE: multiple sequence alignment with high accuracy and high throughput.MUSCLE:具有高精度和高吞吐量的多序列比对。
Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.
2
Phylogenomic inference of protein molecular function: advances and challenges.蛋白质分子功能的系统发育基因组学推断:进展与挑战
Bioinformatics. 2004 Jan 22;20(2):170-9. doi: 10.1093/bioinformatics/bth021.
3
The Pfam protein families database.Pfam蛋白质家族数据库。
从鳜鱼中分离出的一种新型双RNA病毒株的基因组特征与致病性
Genes (Basel). 2025 May 24;16(6):629. doi: 10.3390/genes16060629.
4
Disruption of SETD3-mediated histidine-73 methylation by the BWCFF-associated β-actin G74S mutation.与BWCFF相关的β-肌动蛋白G74S突变对SETD3介导的组氨酸-73甲基化的破坏。
FEBS Lett. 2025 Sep;599(17):2449-2462. doi: 10.1002/1873-3468.70088. Epub 2025 Jun 9.
5
Unveiling the multifaceted domain polymorphism of the Menshen antiphage system.揭示门神抗噬菌体系统的多方面结构域多态性。
Nucleic Acids Res. 2025 May 10;53(9). doi: 10.1093/nar/gkaf357.
6
Impact of Alignments on the Accuracy of Protein Subcellular Localization Predictions.序列比对对蛋白质亚细胞定位预测准确性的影响。
Proteins. 2025 Mar;93(3):745-759. doi: 10.1002/prot.26767. Epub 2024 Nov 22.
7
TranscriptDB: a transcript-centric database to study eukaryotic transcript conservation and evolution.转录本数据库(TranscriptDB):一个以转录本为中心的数据库,用于研究真核生物转录本的保守性和进化。
Nucleic Acids Res. 2025 Jan 6;53(D1):D1235-D1242. doi: 10.1093/nar/gkae995.
8
Chromosome level assemblies of Nakaseomyces (Candida) bracarensis uncover two distinct clades and define its adhesin repertoire.巴氏假丝酵母(念珠菌)染色体水平基因组组装揭示了两个不同的分支,并定义了其黏附素库。
BMC Genomics. 2024 Nov 7;25(1):1053. doi: 10.1186/s12864-024-10979-8.
9
Human selenocysteine synthase, SEPSECS, has evolved to optimize binding of a tRNA-based substrate.人类硒代半胱氨酸合成酶(SEPSECS)经过进化以优化与基于 tRNA 的底物的结合。
Nucleic Acids Res. 2024 Nov 27;52(21):13368-13385. doi: 10.1093/nar/gkae875.
10
SARS-CoV-2 Genotyping Highlights the Challenges in Spike Protein Drift Independent of Other Essential Proteins.严重急性呼吸综合征冠状病毒2型基因分型凸显了刺突蛋白漂移独立于其他必需蛋白所面临的挑战。
Microorganisms. 2024 Sep 9;12(9):1863. doi: 10.3390/microorganisms12091863.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D138-41. doi: 10.1093/nar/gkh121.
4
Quality assessment of multiple alignment programs.多重比对程序的质量评估。
FEBS Lett. 2002 Oct 2;529(1):126-30. doi: 10.1016/s0014-5793(02)03189-7.
5
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.MAFFT:一种基于快速傅里叶变换的快速多序列比对新方法。
Nucleic Acids Res. 2002 Jul 15;30(14):3059-66. doi: 10.1093/nar/gkf436.
6
Recent progress in multiple sequence alignment: a survey.多重序列比对的最新进展:一项综述。
Pharmacogenomics. 2002 Jan;3(1):131-44. doi: 10.1517/14622416.3.1.131.
7
Multiple sequence alignment using partial order graphs.使用偏序图的多序列比对。
Bioinformatics. 2002 Mar;18(3):452-64. doi: 10.1093/bioinformatics/18.3.452.
8
Evaluation of protein multiple alignments by SAM-T99 using the BAliBASE multiple alignment test set.使用BAliBASE多序列比对测试集,通过SAM-T99评估蛋白质多序列比对。
Bioinformatics. 2001 Aug;17(8):713-20. doi: 10.1093/bioinformatics/17.8.713.
9
Multiple alignment of complete sequences (MACS) in the post-genomic era.后基因组时代的全序列多重比对(MACS)
Gene. 2001 May 30;270(1-2):17-30. doi: 10.1016/s0378-1119(01)00461-9.
10
BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations.BAliBASE(基准比对数据库):针对重复序列、跨膜序列和环形排列的增强功能。
Nucleic Acids Res. 2001 Jan 1;29(1):323-6. doi: 10.1093/nar/29.1.323.