• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CRFalign:一种基于 HMM-HMM 比较和条件随机场组合的蛋白质序列-结构比对方法。

CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields.

机构信息

Basic Science Institute, Changwon National University, Changwon 51140, Korea.

Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea.

出版信息

Molecules. 2022 Jun 9;27(12):3711. doi: 10.3390/molecules27123711.

DOI:10.3390/molecules27123711
PMID:35744836
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9231382/
Abstract

Sequence-structure alignment for protein sequences is an important task for the template-based modeling of 3D structures of proteins. Building a reliable sequence-structure alignment is a challenging problem, especially for remote homologue target proteins. We built a method of sequence-structure alignment called CRFalign, which improves upon a base alignment model based on HMM-HMM comparison by employing pairwise conditional random fields in combination with nonlinear scoring functions of structural and sequence features. Nonlinear scoring part is implemented by a set of gradient boosted regression trees. In addition to sequence profile features, various position-dependent structural features are employed including secondary structures and solvent accessibilities. Training is performed on reference alignments at superfamily levels or twilight zone chosen from the SABmark benchmark set. We found that CRFalign method produces relative improvement in terms of average alignment accuracies for validation sets of SABmark benchmark. We also tested CRFalign on 51 sequence-structure pairs involving 15 FM target domains of CASP14, where we could see that CRFalign leads to an improvement in average modeling accuracies in these hard targets (TM-CRFalign ≃42.94%) compared with that of HHalign (TM-HHalign ≃39.05%) and also that of MRFalign (TM-MRFalign ≃36.93%). CRFalign was incorporated to our template search framework called CRFpred and was tested for a random target set of 300 target proteins consisting of Easy, Medium and Hard sets which showed a reasonable template search performance.

摘要

蛋白质序列的结构-序列比对是基于模板的蛋白质 3D 结构建模的一项重要任务。建立可靠的结构-序列比对是一个具有挑战性的问题,特别是对于远程同源目标蛋白质。我们构建了一种称为 CRFalign 的结构-序列比对方法,该方法通过使用成对条件随机场结合结构和序列特征的非线性评分函数改进了基于 HMM-HMM 比较的基础比对模型。非线性评分部分由一组梯度提升回归树实现。除了序列轮廓特征外,还使用了各种位置相关的结构特征,包括二级结构和溶剂可及性。训练是在来自 SABmark 基准集的超家族级别或黄昏带的参考比对上进行的。我们发现,CRFalign 方法在 SABmark 基准验证集的平均比对精度方面产生了相对提高。我们还在涉及 CASP14 的 15 个 FM 目标结构域的 51 个结构-序列对中测试了 CRFalign,在这些硬目标中,我们可以看到 CRFalign 导致平均建模精度提高(TM-CRFalign≈42.94%)与 HHalign(TM-HHalign≈39.05%)和 MRFalign(TM-MRFalign≈36.93%)相比。CRFalign 被整合到我们的模板搜索框架 CRFpred 中,并对由易、中、难三组组成的 300 个随机目标蛋白集进行了测试,结果显示出合理的模板搜索性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/8826aeed358f/molecules-27-03711-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/4c3e217a2f4e/molecules-27-03711-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/793e37950a7b/molecules-27-03711-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/3260a6f21839/molecules-27-03711-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/8d47663b2a99/molecules-27-03711-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/d03b1fcf1cf1/molecules-27-03711-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/1e62f794fb8a/molecules-27-03711-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/7a6c9effafd5/molecules-27-03711-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/f9013c842721/molecules-27-03711-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/6b8869132f4d/molecules-27-03711-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/7cc306a73128/molecules-27-03711-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/a6d7ef1bc52a/molecules-27-03711-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/44197d57d0d7/molecules-27-03711-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/cf1562d66ed3/molecules-27-03711-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/8c9b377ef9d0/molecules-27-03711-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/9af342ad18d7/molecules-27-03711-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/b53a2468af2a/molecules-27-03711-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/e3f16ed60a11/molecules-27-03711-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/8826aeed358f/molecules-27-03711-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/4c3e217a2f4e/molecules-27-03711-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/793e37950a7b/molecules-27-03711-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/3260a6f21839/molecules-27-03711-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/8d47663b2a99/molecules-27-03711-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/d03b1fcf1cf1/molecules-27-03711-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/1e62f794fb8a/molecules-27-03711-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/7a6c9effafd5/molecules-27-03711-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/f9013c842721/molecules-27-03711-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/6b8869132f4d/molecules-27-03711-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/7cc306a73128/molecules-27-03711-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/a6d7ef1bc52a/molecules-27-03711-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/44197d57d0d7/molecules-27-03711-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/cf1562d66ed3/molecules-27-03711-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/8c9b377ef9d0/molecules-27-03711-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/9af342ad18d7/molecules-27-03711-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/b53a2468af2a/molecules-27-03711-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/e3f16ed60a11/molecules-27-03711-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e66/9231382/8826aeed358f/molecules-27-03711-g018.jpg

相似文献

1
CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields.CRFalign:一种基于 HMM-HMM 比较和条件随机场组合的蛋白质序列-结构比对方法。
Molecules. 2022 Jun 9;27(12):3711. doi: 10.3390/molecules27123711.
2
MRFalign: protein homology detection through alignment of Markov random fields.MRFalign:通过马尔可夫随机场比对进行蛋白质同源性检测。
PLoS Comput Biol. 2014 Mar 27;10(3):e1003500. doi: 10.1371/journal.pcbi.1003500. eCollection 2014 Mar.
3
Template based protein structure modeling by global optimization in CASP11.在蛋白质结构预测技术关键评估第11轮(CASP11)中基于模板的蛋白质结构全局优化建模
Proteins. 2016 Sep;84 Suppl 1:221-32. doi: 10.1002/prot.24917. Epub 2015 Sep 14.
4
HHalign-Kbest: exploring sub-optimal alignments for remote homology comparative modeling.HHalign-Kbest:探索远程同源性比较建模的次优比对。
Bioinformatics. 2015 Dec 1;31(23):3850-2. doi: 10.1093/bioinformatics/btv441. Epub 2015 Jul 30.
5
AlignHUSH: alignment of HMMs using structure and hydrophobicity information.AlignHUSH:使用结构和疏水性信息对齐隐马尔可夫模型。
BMC Bioinformatics. 2011 Jul 5;12:275. doi: 10.1186/1471-2105-12-275.
6
Enhancing HMM-based protein profile-profile alignment with structural features and evolutionary coupling information.利用结构特征和进化耦合信息增强基于隐马尔可夫模型的蛋白质序列-序列比对。
BMC Bioinformatics. 2014 Jul 25;15(1):252. doi: 10.1186/1471-2105-15-252.
7
MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction.基于深度学习和距离预测的 MULTICOM2 开源蛋白质结构预测系统。
Sci Rep. 2021 Jun 23;11(1):13155. doi: 10.1038/s41598-021-92395-6.
8
Protein alignment based on higher order conditional random fields for template-based modeling.基于高阶条件随机场的蛋白质比对用于基于模板的建模。
PLoS One. 2018 Jun 1;13(6):e0197912. doi: 10.1371/journal.pone.0197912. eCollection 2018.
9
Context similarity scoring improves protein sequence alignments in the midnight zone.语境相似性评分提高了午夜区的蛋白质序列比对。
Bioinformatics. 2015 Mar 1;31(5):674-81. doi: 10.1093/bioinformatics/btu697. Epub 2014 Oct 22.
10
SSALN: an alignment algorithm using structure-dependent substitution matrices and gap penalties learned from structurally aligned protein pairs.SSALN:一种使用从结构比对的蛋白质对中学习到的依赖于结构的替换矩阵和空位罚分的比对算法。
Proteins. 2006 Mar 1;62(4):881-91. doi: 10.1002/prot.20854.

引用本文的文献

1
DeepFold: enhancing protein structure prediction through optimized loss functions, improved template features, and re-optimized energy function.DeepFold:通过优化损失函数、改进模板特征和重新优化能量函数来增强蛋白质结构预测。
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad712.

本文引用的文献

1
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
2
Improved protein structure prediction using potentials from deep learning.利用深度学习势进行蛋白质结构预测的改进。
Nature. 2020 Jan;577(7792):706-710. doi: 10.1038/s41586-019-1923-7. Epub 2020 Jan 15.
3
Sequence alignment using machine learning for accurate template-based protein structure prediction.基于机器学习的序列比对在准确的基于模板的蛋白质结构预测中的应用。
Bioinformatics. 2020 Jan 1;36(1):104-111. doi: 10.1093/bioinformatics/btz483.
4
Protein structure modeling and refinement by global optimization in CASP12.通过全局优化进行蛋白质结构建模与精修:在第12届蛋白质结构预测关键评估(CASP12)中的研究
Proteins. 2018 Mar;86 Suppl 1:122-135. doi: 10.1002/prot.25426. Epub 2017 Dec 5.
5
Template based protein structure modeling by global optimization in CASP11.在蛋白质结构预测技术关键评估第11轮(CASP11)中基于模板的蛋白质结构全局优化建模
Proteins. 2016 Sep;84 Suppl 1:221-32. doi: 10.1002/prot.24917. Epub 2015 Sep 14.
6
Enhancing HMM-based protein profile-profile alignment with structural features and evolutionary coupling information.利用结构特征和进化耦合信息增强基于隐马尔可夫模型的蛋白质序列-序列比对。
BMC Bioinformatics. 2014 Jul 25;15(1):252. doi: 10.1186/1471-2105-15-252.
7
MRFalign: protein homology detection through alignment of Markov random fields.MRFalign:通过马尔可夫随机场比对进行蛋白质同源性检测。
PLoS Comput Biol. 2014 Mar 27;10(3):e1003500. doi: 10.1371/journal.pcbi.1003500. eCollection 2014 Mar.
8
A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction.用于蛋白质结构预测的20种代表性序列比对方法的比较评估与分析。
Sci Rep. 2013;3:2619. doi: 10.1038/srep02619.
9
Protein structure modeling for CASP10 by multiple layers of global optimization.通过多层全局优化进行CASP10的蛋白质结构建模。
Proteins. 2014 Feb;82 Suppl 2:188-95. doi: 10.1002/prot.24397. Epub 2013 Oct 24.
10
A conditional neural fields model for protein threading.条件神经场模型在蛋白质穿线中的应用。
Bioinformatics. 2012 Jun 15;28(12):i59-66. doi: 10.1093/bioinformatics/bts213.