• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

优化序列比对的替换矩阵选择和空位参数。

Optimizing substitution matrix choice and gap parameters for sequence alignment.

出版信息

BMC Bioinformatics. 2009 Dec 2;10:396. doi: 10.1186/1471-2105-10-396.

DOI:10.1186/1471-2105-10-396
PMID:19954534
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2791778/
Abstract

BACKGROUND

While substitution matrices can readily be computed from reference alignments, it is challenging to compute optimal or approximately optimal gap penalties. It is also not well understood which substitution matrices are the most effective when alignment accuracy is the goal rather than homolog recognition. Here a new parameter optimization procedure, POP, is described and applied to the problems of optimizing gap penalties and selecting substitution matrices for pair-wise global protein alignments.

RESULTS

POP is compared to a recent method due to Kim and Kececioglu and found to achieve from 0.2% to 1.3% higher accuracies on pair-wise benchmarks extracted from BALIBASE. The VTML matrix series is shown to be the most accurate on several global pair-wise alignment benchmarks, with VTML200 giving best or close to the best performance in all tests. BLOSUM matrices are found to be slightly inferior, even with the marginal improvements in the bug-fixed RBLOSUM series. The PAM series is significantly worse, giving accuracies typically 2% less than VTML. Integer rounding is found to cause slight degradations in accuracy. No evidence is found that selecting a matrix based on sequence divergence improves accuracy, suggesting that the use of this heuristic in CLUSTALW may be ineffective. Using VTML200 is found to improve the accuracy of CLUSTALW by 8% on BALIBASE and 5% on PREFAB.

CONCLUSION

The hypothesis that more accurate alignments of distantly related sequences may be achieved using low-identity matrices is shown to be false for commonly used matrix types. Source code and test data is freely available from the author's web site at http://www.drive5.com/pop.

摘要

背景

虽然替换矩阵可以从参考比对中轻松计算出来,但计算最佳或近似最佳的空位罚分是具有挑战性的。当目标是对齐准确性而不是同源识别时,哪种替换矩阵最有效也不太清楚。这里描述了一种新的参数优化程序 POP,并将其应用于优化空位罚分和选择用于两两全局蛋白质比对的替换矩阵的问题。

结果

POP 与 Kim 和 Kececioglu 的最新方法进行了比较,在从 BALIBASE 中提取的两两基准测试中,POP 实现了 0.2%至 1.3%的更高准确性。在几个全局两两比对基准测试中,VTML 矩阵系列被证明是最准确的,VTML200 在所有测试中都表现出最佳或接近最佳的性能。BLOSUM 矩阵稍逊一筹,即使在修正后的 RBLOSUM 系列中略有改进。PAM 系列明显较差,准确性通常比 VTML 低 2%。发现整数舍入会导致准确性略有下降。没有证据表明根据序列分歧选择矩阵会提高准确性,这表明 CLUSTALW 中使用此启发式可能无效。在 BALIBASE 上,使用 VTML200 可将 CLUSTALW 的准确性提高 8%,在 PREFAB 上可提高 5%。

结论

对于常用的矩阵类型,远缘序列的更准确比对可能使用低同一性矩阵来实现的假设被证明是错误的。源代码和测试数据可从作者的网站(http://www.drive5.com/pop)免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f01e/2791778/27d78120ef10/1471-2105-10-396-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f01e/2791778/df71a27e712f/1471-2105-10-396-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f01e/2791778/0d60ad0d677f/1471-2105-10-396-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f01e/2791778/92d060732271/1471-2105-10-396-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f01e/2791778/27d78120ef10/1471-2105-10-396-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f01e/2791778/df71a27e712f/1471-2105-10-396-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f01e/2791778/0d60ad0d677f/1471-2105-10-396-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f01e/2791778/92d060732271/1471-2105-10-396-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f01e/2791778/27d78120ef10/1471-2105-10-396-4.jpg

相似文献

1
Optimizing substitution matrix choice and gap parameters for sequence alignment.优化序列比对的替换矩阵选择和空位参数。
BMC Bioinformatics. 2009 Dec 2;10:396. doi: 10.1186/1471-2105-10-396.
2
The ranging of amino acids substitution matrices of various types in accordance with the alignment accuracy criterion.根据比对准确性标准对各种类型氨基酸替换矩阵进行排序。
BMC Bioinformatics. 2020 Sep 14;21(Suppl 11):294. doi: 10.1186/s12859-020-03616-0.
3
RBLOSUM performs better than CorBLOSUM with lesser error per query.RBLOSUM的表现优于CorBLOSUM,每个查询的错误更少。
BMC Res Notes. 2018 May 21;11(1):328. doi: 10.1186/s13104-018-3415-5.
4
Addressing inaccuracies in BLOSUM computation improves homology search performance.解决BLOSUM计算中的不准确问题可提高同源性搜索性能。
BMC Bioinformatics. 2016 Apr 27;17:189. doi: 10.1186/s12859-016-1060-3.
5
OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy.OXBench:一种用于评估蛋白质多序列比对准确性的基准。
BMC Bioinformatics. 2003 Oct 10;4:47. doi: 10.1186/1471-2105-4-47.
6
Parameterized BLOSUM Matrices for Protein Alignment.用于蛋白质比对的参数化BLOSUM矩阵
IEEE/ACM Trans Comput Biol Bioinform. 2015 May-Jun;12(3):686-94. doi: 10.1109/TCBB.2014.2366126.
7
Alignment of helical membrane protein sequences using AlignMe.使用 AlignMe 对齐螺旋膜蛋白序列。
PLoS One. 2013;8(3):e57731. doi: 10.1371/journal.pone.0057731. Epub 2013 Mar 4.
8
Optimizing amino acid substitution matrices with a local alignment kernel.使用局部比对核优化氨基酸替换矩阵。
BMC Bioinformatics. 2006 May 5;7:246. doi: 10.1186/1471-2105-7-246.
9
Periodic distributions of hydrophobic amino acids allows the definition of fundamental building blocks to align distantly related proteins.疏水性氨基酸的周期性分布有助于定义基本构建模块,从而比对远缘相关的蛋白质。
Proteins. 2007 May 15;67(3):695-708. doi: 10.1002/prot.21319.
10
MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities.MSAProbs:基于对隐马尔可夫模型和分区函数后验概率的多重序列比对。
Bioinformatics. 2010 Aug 15;26(16):1958-64. doi: 10.1093/bioinformatics/btq338. Epub 2010 Jun 23.

引用本文的文献

1
Major advances in protein function assignment by remote homolog detection with protein language models - A review.利用蛋白质语言模型通过远程同源性检测进行蛋白质功能分配的重大进展——综述
Curr Opin Struct Biol. 2025 Feb;90:102984. doi: 10.1016/j.sbi.2025.102984. Epub 2025 Jan 27.
2
New alignment method for remote protein sequences by the direct use of pairwise sequence correlations and substitutions.通过直接利用成对序列相关性和替换来对远程蛋白质序列进行新的比对方法。
Front Bioinform. 2023 Oct 12;3:1227193. doi: 10.3389/fbinf.2023.1227193. eCollection 2023.
3
Developing similarity matrices for antibody-protein binding interactions.

本文引用的文献

1
Automatic parameter learning for multiple local network alignment.用于多局部网络对齐的自动参数学习
J Comput Biol. 2009 Aug;16(8):1001-22. doi: 10.1089/cmb.2009.0099.
2
Learning scoring schemes for sequence alignment from partial examples.从部分示例中学习序列比对的评分方案。
IEEE/ACM Trans Comput Biol Bioinform. 2008 Oct-Dec;5(4):546-56. doi: 10.1109/TCBB.2008.57.
3
BLOSUM62 miscalculations improve search performance.BLOSUM62算法的误算可提高搜索性能。
开发抗体-蛋白质结合相互作用的相似性矩阵。
PLoS One. 2023 Oct 26;18(10):e0293606. doi: 10.1371/journal.pone.0293606. eCollection 2023.
4
Mutation Space of Spatially Conserved Amino Acid Sites in Proteins.蛋白质中空间保守氨基酸位点的突变空间
ACS Omega. 2023 Jun 28;8(27):24302-24310. doi: 10.1021/acsomega.3c01473. eCollection 2023 Jul 11.
5
Genes Polymorphism Depicts Developmental Disruption of Common Sole Eggs.基因多态性揭示了欧洲鳎鱼卵发育的异常。
Open Life Sci. 2019 Dec 31;14:549-563. doi: 10.1515/biol-2019-0061. eCollection 2019 Jan.
6
New amino acid substitution matrix brings sequence alignments into agreement with structure matches.新的氨基酸替代矩阵使序列比对与结构匹配一致。
Proteins. 2021 Jun;89(6):671-682. doi: 10.1002/prot.26050. Epub 2021 Feb 2.
7
A weighted string kernel for protein fold recognition.一种用于蛋白质折叠识别的加权字符串核。
BMC Bioinformatics. 2017 Aug 25;18(1):378. doi: 10.1186/s12859-017-1795-5.
8
PFASUM: a substitution matrix from Pfam structural alignments.PFASUM:一种来自Pfam结构比对的替换矩阵。
BMC Bioinformatics. 2017 Jun 5;18(1):293. doi: 10.1186/s12859-017-1703-z.
9
FAMSA: Fast and accurate multiple sequence alignment of huge protein families.FAMSA:超大型蛋白质家族的快速准确多序列比对
Sci Rep. 2016 Sep 27;6:33964. doi: 10.1038/srep33964.
10
PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids.PR2ALIGN:一个用于利用氨基酸加权生化特性进行蛋白质序列比对的独立软件程序和网络服务器。
BMC Res Notes. 2015 May 7;8:187. doi: 10.1186/s13104-015-1152-6.
Nat Biotechnol. 2008 Mar;26(3):274-5. doi: 10.1038/nbt0308-274.
4
Clustal W and Clustal X version 2.0.Clustal W和Clustal X 2.0版本
Bioinformatics. 2007 Nov 1;23(21):2947-8. doi: 10.1093/bioinformatics/btm404. Epub 2007 Sep 10.
5
Multiple alignment of protein sequences with repeats and rearrangements.具有重复和重排的蛋白质序列的多序列比对。
Nucleic Acids Res. 2006;34(20):5932-42. doi: 10.1093/nar/gkl511. Epub 2006 Oct 26.
6
MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information.MUMMALS:通过使用具有局部结构信息的隐马尔可夫模型改进的多序列比对。
Nucleic Acids Res. 2006;34(16):4364-74. doi: 10.1093/nar/gkl514. Epub 2006 Aug 26.
7
Analysis and comparison of benchmarks for multiple sequence alignment.多序列比对基准的分析与比较
In Silico Biol. 2006;6(4):321-39.
8
CONTRAfold: RNA secondary structure prediction without physics-based models.CONTRAfold:无需基于物理模型的RNA二级结构预测
Bioinformatics. 2006 Jul 15;22(14):e90-8. doi: 10.1093/bioinformatics/btl246.
9
Parametric alignment of Drosophila genomes.果蝇基因组的参数比对
PLoS Comput Biol. 2006 Jun 23;2(6):e73. doi: 10.1371/journal.pcbi.0020073.
10
Kalign--an accurate and fast multiple sequence alignment algorithm.Kalign——一种准确且快速的多序列比对算法。
BMC Bioinformatics. 2005 Dec 12;6:298. doi: 10.1186/1471-2105-6-298.