• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多种序列比对方法的综合基准研究:当前的挑战与未来展望。

A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives.

机构信息

Département de Biologie Structurale et Génomique, IGBMC (Institut de Génétique et de Biologie Moléculaire et Cellulaire), CNRS/INSERM/Université de Strasbourg, Illkirch, France.

出版信息

PLoS One. 2011 Mar 31;6(3):e18093. doi: 10.1371/journal.pone.0018093.

DOI:10.1371/journal.pone.0018093
PMID:21483869
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3069049/
Abstract

Multiple comparison or alignmentof protein sequences has become a fundamental tool in many different domains in modern molecular biology, from evolutionary studies to prediction of 2D/3D structure, molecular function and inter-molecular interactions etc. By placing the sequence in the framework of the overall family, multiple alignments can be used to identify conserved features and to highlight differences or specificities. In this paper, we describe a comprehensive evaluation of many of the most popular methods for multiple sequence alignment (MSA), based on a new benchmark test set. The benchmark is designed to represent typical problems encountered when aligning the large protein sequence sets that result from today's high throughput biotechnologies. We show that alignmentmethods have significantly progressed and can now identify most of the shared sequence features that determine the broad molecular function(s) of a protein family, even for divergent sequences. However,we have identified a number of important challenges. First, the locally conserved regions, that reflect functional specificities or that modulate a protein's function in a given cellular context,are less well aligned. Second, motifs in natively disordered regions are often misaligned. Third, the badly predicted or fragmentary protein sequences, which make up a large proportion of today's databases, lead to a significant number of alignment errors. Based on this study, we demonstrate that the existing MSA methods can be exploited in combination to improve alignment accuracy, although novel approaches will still be needed to fully explore the most difficult regions. We then propose knowledge-enabled, dynamic solutions that will hopefully pave the way to enhanced alignment construction and exploitation in future evolutionary systems biology studies.

摘要

蛋白质序列的多重比较或比对已成为现代分子生物学中许多不同领域的基本工具,从进化研究到 2D/3D 结构、分子功能和分子间相互作用的预测等。通过将序列置于整个家族的框架中,多重比对可用于识别保守特征,并突出差异或特异性。在本文中,我们基于新的基准测试集对许多最流行的多序列比对(MSA)方法进行了全面评估。该基准旨在代表在当今高通量生物技术产生的大型蛋白质序列集中进行比对时遇到的典型问题。我们表明,比对方法已经取得了显著进展,即使对于高度分化的序列,现在也能够识别决定蛋白质家族广泛分子功能的大多数共享序列特征。然而,我们也发现了一些重要的挑战。首先,局部保守区域,反映功能特异性或在特定细胞环境中调节蛋白质功能的区域,对齐效果较差。其次,天然无序区域中的模体经常被错误对齐。第三,预测不良或片段化的蛋白质序列占当今数据库的很大一部分,导致大量的比对错误。基于这项研究,我们证明可以结合使用现有的 MSA 方法来提高比对准确性,尽管仍需要新的方法来充分探索最困难的区域。然后,我们提出了基于知识的动态解决方案,希望为未来的进化系统生物学研究中的增强比对构建和利用铺平道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/d1d19f4b16cd/pone.0018093.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/19ef9191dc47/pone.0018093.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/3ccfd2a929be/pone.0018093.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/1469bcd8bf69/pone.0018093.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/51a11c50dae2/pone.0018093.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/a7f910338925/pone.0018093.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/a3aa34a855ee/pone.0018093.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/da4a6eaab9db/pone.0018093.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/48ab6167695f/pone.0018093.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/d1d19f4b16cd/pone.0018093.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/19ef9191dc47/pone.0018093.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/3ccfd2a929be/pone.0018093.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/1469bcd8bf69/pone.0018093.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/51a11c50dae2/pone.0018093.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/a7f910338925/pone.0018093.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/a3aa34a855ee/pone.0018093.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/da4a6eaab9db/pone.0018093.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/48ab6167695f/pone.0018093.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a93/3069049/d1d19f4b16cd/pone.0018093.g009.jpg

相似文献

1
A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives.多种序列比对方法的综合基准研究:当前的挑战与未来展望。
PLoS One. 2011 Mar 31;6(3):e18093. doi: 10.1371/journal.pone.0018093.
2
BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark.BAliBASE 3.0:多序列比对基准测试的最新进展。
Proteins. 2005 Oct 1;61(1):127-36. doi: 10.1002/prot.20527.
3
OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy.OXBench:一种用于评估蛋白质多序列比对准确性的基准。
BMC Bioinformatics. 2003 Oct 10;4:47. doi: 10.1186/1471-2105-4-47.
4
LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system.LEON-BIS:使用贝叶斯推理系统对序列邻域进行多重比对评估。
BMC Bioinformatics. 2016 Jul 7;17(1):271. doi: 10.1186/s12859-016-1146-y.
5
Protein multiple sequence alignment benchmarking through secondary structure prediction.通过二级结构预测进行蛋白质多序列比对基准测试。
Bioinformatics. 2017 May 1;33(9):1331-1337. doi: 10.1093/bioinformatics/btw840.
6
PROMALS3D: multiple protein sequence alignment enhanced with evolutionary and three-dimensional structural information.PROMALS3D:利用进化和三维结构信息增强的多序列比对
Methods Mol Biol. 2014;1079:263-71. doi: 10.1007/978-1-62703-646-7_17.
7
Towards a reliable objective function for multiple sequence alignments.迈向用于多序列比对的可靠目标函数。
J Mol Biol. 2001 Dec 7;314(4):937-51. doi: 10.1006/jmbi.2001.5187.
8
GLProbs: Aligning Multiple Sequences Adaptively.GL问题:自适应多序列比对
IEEE/ACM Trans Comput Biol Bioinform. 2015 Jan-Feb;12(1):67-78. doi: 10.1109/TCBB.2014.2316820.
9
DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment.DECIPHER:利用局部序列上下文来改进蛋白质多序列比对。
BMC Bioinformatics. 2015 Oct 6;16:322. doi: 10.1186/s12859-015-0749-z.
10
Rapid detection, classification and accurate alignment of up to a million or more related protein sequences.快速检测、分类和精确比对多达上百万条甚至更多的相关蛋白质序列。
Bioinformatics. 2009 Aug 1;25(15):1869-75. doi: 10.1093/bioinformatics/btp342. Epub 2009 Jun 8.

引用本文的文献

1
Protein Language Model Identifies Disordered, Conserved Motifs Implicated in Phase Separation.蛋白质语言模型识别出与相分离相关的无序保守基序。
bioRxiv. 2025 Jul 23:2024.12.12.628175. doi: 10.1101/2024.12.12.628175.
2
ProtMamba: a homology-aware but alignment-free protein state space model.ProtMamba:一种同源性感知但无比对的蛋白质状态空间模型。
Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf348.
3
Variant evolution graph: Can we infer how SARS-CoV-2 variants are evolving?变异进化图:我们能否推断出严重急性呼吸综合征冠状病毒2(SARS-CoV-2)变体是如何进化的?

本文引用的文献

1
More than 1,001 problems with protein domain databases: transmembrane regions, signal peptides and the issue of sequence homology.蛋白质结构域数据库的 1001 个问题:跨膜区、信号肽和序列同源性问题。
PLoS Comput Biol. 2010 Jul 29;6(7):e1000867. doi: 10.1371/journal.pcbi.1000867.
2
Issues in bioinformatics benchmarking: the case study of multiple sequence alignment.生物信息学基准测试中的问题:多序列比对案例研究。
Nucleic Acids Res. 2010 Nov;38(21):7353-63. doi: 10.1093/nar/gkq625. Epub 2010 Jul 17.
3
Multi-Harmony: detecting functional specificity from sequence alignment.
PLoS One. 2025 Jun 9;20(6):e0323970. doi: 10.1371/journal.pone.0323970. eCollection 2025.
4
Unlocking ADAMTS-5: insights into TMJ proteomics and docking dynamics.解锁ADAMTS-5:对颞下颌关节蛋白质组学和对接动力学的见解
J Orthod Sci. 2025 Mar 25;14:11. doi: 10.4103/jos.jos_89_24. eCollection 2025.
5
Protein A-like Peptide Design Based on Diffusion and ESM2 Models.基于扩散和 ESM2 模型的蛋白 A 样肽设计。
Molecules. 2024 Oct 21;29(20):4965. doi: 10.3390/molecules29204965.
6
PRIEST: predicting viral mutations with immune escape capability of SARS-CoV-2 using temporal evolutionary information.PRIEST:利用 SARS-CoV-2 的时间进化信息预测具有免疫逃逸能力的病毒突变。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae218.
7
EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment.EMMA:一种在给定约束子集比对的情况下计算多序列比对的新方法。
Algorithms Mol Biol. 2023 Dec 7;18(1):21. doi: 10.1186/s13015-023-00247-x.
8
CoreDetector: a flexible and efficient program for core-genome alignment of evolutionary diverse genomes.CoreDetector:一个用于进化多样化基因组核心基因组比对的灵活高效程序。
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad628.
9
The difficulty of aligning intrinsically disordered protein sequences as assessed by conservation and phylogeny.基于保守性和系统发生分析评估的无序蛋白质序列比对的困难度。
PLoS One. 2023 Jul 13;18(7):e0288388. doi: 10.1371/journal.pone.0288388. eCollection 2023.
10
On closing the inopportune gap with consistency transformation and iterative refinement.以一致性变换和迭代细化来弥合不合时宜的差距。
PLoS One. 2023 Jul 13;18(7):e0287483. doi: 10.1371/journal.pone.0287483. eCollection 2023.
多和谐:从序列比对中检测功能特异性。
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W35-40. doi: 10.1093/nar/gkq415. Epub 2010 Jun 4.
4
3DM: systematic analysis of heterogeneous superfamily data to discover protein functionalities.3DM:系统分析异构超家族数据以发现蛋白质功能。
Proteins. 2010 Jul;78(9):2101-13. doi: 10.1002/prot.22725.
5
Phylogenetic assessment of alignments reveals neglected tree signal in gaps.系统发育评估揭示了空位中被忽视的树信号。
Genome Biol. 2010;11(4):R37. doi: 10.1186/gb-2010-11-4-r37. Epub 2010 Apr 6.
6
Protein interactions and ligand binding: from protein subfamilies to functional specificity.蛋白质相互作用和配体结合:从蛋白质亚家族到功能特异性。
Proc Natl Acad Sci U S A. 2010 Feb 2;107(5):1995-2000. doi: 10.1073/pnas.0908044107. Epub 2010 Jan 19.
7
Evaluation of three automated genome annotations for Halorhabdus utahensis.对犹他嗜盐杆菌三种自动基因组注释的评估。
PLoS One. 2009 Jul 20;4(7):e6291. doi: 10.1371/journal.pone.0006291.
8
Darwinian evolution in the light of genomics.基因组学视角下的达尔文进化论。
Nucleic Acids Res. 2009 Mar;37(4):1011-34. doi: 10.1093/nar/gkp089. Epub 2009 Feb 12.
9
Strategies for reliable exploitation of evolutionary concepts in high throughput biology.在高通量生物学中可靠利用进化概念的策略。
Evol Bioinform Online. 2008 May 8;4:121-37. doi: 10.4137/ebo.s597.
10
Jalview Version 2--a multiple sequence alignment editor and analysis workbench.Jalview 2版本——一个多序列比对编辑器和分析工作台。
Bioinformatics. 2009 May 1;25(9):1189-91. doi: 10.1093/bioinformatics/btp033. Epub 2009 Jan 16.