利用模板和序列嵌入改进蛋白质结构预测。

Improving protein structure prediction using templates and sequence embedding.

机构信息

Institute of Computing Technology, Chinese Academy of Sciences, Beijing 626011, China.

Toyota Technological Institute at Chicago, Chicago, IL 60637, USA.

出版信息

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac723.

DOI:10.1093/bioinformatics/btac723

PMID:36355462

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9805584/

Abstract

MOTIVATION

Protein structure prediction has been greatly improved by deep learning, but the contribution of different information is yet to be fully understood. This article studies the impacts of two kinds of information for structure prediction: template and multiple sequence alignment (MSA) embedding. Templates have been used by some methods before, such as AlphaFold2, RoseTTAFold and RaptorX. AlphaFold2 and RosetTTAFold only used templates detected by HHsearch, which may not perform very well on some targets. In addition, sequence embedding generated by pre-trained protein language models has not been fully explored for structure prediction. In this article, we study the impact of templates (including the number of templates, the template quality and how the templates are generated) on protein structure prediction accuracy, especially when the templates are detected by methods other than HHsearch. We also study the impact of sequence embedding (generated by MSATransformer and ESM-1b) on structure prediction.

RESULTS

We have implemented a deep learning method for protein structure prediction that may take templates and MSA embedding as extra inputs. We study the contribution of templates and MSA embedding to structure prediction accuracy. Our experimental results show that templates can improve structure prediction on 71 of 110 CASP13 (13th Critical Assessment of Structure Prediction) targets and 47 of 91 CASP14 targets, and templates are particularly useful for targets with similar templates. MSA embedding can improve structure prediction on 63 of 91 CASP14 (14th Critical Assessment of Structure Prediction) targets and 87 of 183 CAMEO targets and is particularly useful for proteins with shallow MSAs. When both templates and MSA embedding are used, our method can predict correct folds (TMscore > 0.5) for 16 of 23 CASP14 FM targets and 14 of 18 Continuous Automated Model Evaluation (CAMEO) targets, outperforming RoseTTAFold by 5% and 7%, respectively.

AVAILABILITY AND IMPLEMENTATION

Available at https://github.com/xluo233/RaptorXFold.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

深度学习极大地提高了蛋白质结构预测的性能，但不同信息的贡献仍有待充分理解。本文研究了两种结构预测信息：模板和多重序列比对（MSA）嵌入的影响。一些方法（如 AlphaFold2、RoseTTAFold 和 RaptorX）之前已经使用了模板。AlphaFold2 和 RosetTTAFold 仅使用 HHsearch 检测到的模板，这可能在某些目标上表现不佳。此外，基于预训练的蛋白质语言模型生成的序列嵌入尚未被充分探索用于结构预测。本文研究了模板（包括模板数量、模板质量以及模板生成方式）对蛋白质结构预测准确性的影响，特别是当模板不是由 HHsearch 检测到的情况下。我们还研究了序列嵌入（由 MSATransformer 和 ESM-1b 生成）对结构预测的影响。

结果

我们实现了一种深度学习方法，该方法可以将模板和 MSA 嵌入作为额外输入用于蛋白质结构预测。我们研究了模板和 MSA 嵌入对结构预测准确性的贡献。实验结果表明，模板可以提高 110 个 CASP13（第 13 届蛋白质结构预测关键评估）目标中的 71 个和 91 个 CASP14 目标中的 47 个的结构预测准确性，对于具有相似模板的目标，模板尤其有用。MSA 嵌入可以提高 91 个 CASP14（第 14 届蛋白质结构预测关键评估）目标中的 63 个和 183 个 CAMEO 目标中的 87 个的结构预测准确性，对于 MSA 较浅的蛋白质尤其有用。当同时使用模板和 MSA 嵌入时，我们的方法可以预测 23 个 CASP14 FM 目标中的 16 个和 18 个连续自动模型评估（CAMEO）目标中的 14 个正确折叠（TMscore＞0.5），比 RoseTTAFold 分别提高了 5%和 7%。

可用性和实现

可在 https://github.com/xluo233/RaptorXFold 上获得。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc8b/9805584/34740a25da6f/btac723f1.jpg

相似文献

Improving protein structure prediction using templates and sequence embedding.利用模板和序列嵌入改进蛋白质结构预测。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac723.

Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14.通过深度学习和距离预测改进 CASP14 中的蛋白质三级结构预测。

Proteins. 2022 Jan;90(1):58-72. doi: 10.1002/prot.26186. Epub 2021 Jul 27.

Analysis of distance-based protein structure prediction by deep learning in CASP13.基于深度学习的 CASP13 蛋白质结构预测距离分析。

Proteins. 2019 Dec;87(12):1069-1081. doi: 10.1002/prot.25810. Epub 2019 Sep 13.

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.基于超深度学习模型的蛋白质接触图从头精确预测

PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.

Improving deep learning-based protein distance prediction in CASP14.在蛋白质结构预测关键评估第14轮（CASP14）中改进基于深度学习的蛋白质距离预测

Bioinformatics. 2021 Oct 11;37(19):3190-3196. doi: 10.1093/bioinformatics/btab355.

Target classification in the 14th round of the critical assessment of protein structure prediction (CASP14).第 14 轮蛋白质结构预测关键评估（CASP14）中的目标分类。

Proteins. 2021 Dec;89(12):1618-1632. doi: 10.1002/prot.26202. Epub 2021 Aug 19.

Deep-learning contact-map guided protein structure prediction in CASP13.深度学习接触图指导的 CASP13 蛋白质结构预测。

Proteins. 2019 Dec;87(12):1149-1164. doi: 10.1002/prot.25792. Epub 2019 Aug 14.

MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction.基于深度学习和距离预测的 MULTICOM2 开源蛋白质结构预测系统。

Sci Rep. 2021 Jun 23;11(1):13155. doi: 10.1038/s41598-021-92395-6.

Protein inter-residue contact and distance prediction by coupling complementary coevolution features with deep residual networks in CASP14.通过在 CASP14 中结合互补共进化特征和深度残差网络来预测蛋白质残基间的接触和距离。

Proteins. 2021 Dec;89(12):1911-1921. doi: 10.1002/prot.26211. Epub 2021 Aug 19.

Scoring protein sequence alignments using deep learning.使用深度学习对蛋白质序列比对进行评分。

Bioinformatics. 2022 May 26;38(11):2988-2995. doi: 10.1093/bioinformatics/btac210.

引用本文的文献

Bridging artificial intelligence and biological sciences: a comprehensive review of large language models in bioinformatics.连接人工智能与生物科学：生物信息学中大型语言模型的全面综述

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf357.

DeepFold: enhancing protein structure prediction through optimized loss functions, improved template features, and re-optimized energy function.DeepFold：通过优化损失函数、改进模板特征和重新优化能量函数来增强蛋白质结构预测。

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad712.

Role of environmental specificity in CASP results.环境特异性在 CASP 结果中的作用。

BMC Bioinformatics. 2023 Nov 11;24(1):425. doi: 10.1186/s12859-023-05559-8.

Representing structures of the multiple conformational states of proteins.表示蛋白质的多种构象状态的结构。

Curr Opin Struct Biol. 2023 Dec;83:102703. doi: 10.1016/j.sbi.2023.102703. Epub 2023 Sep 28.

本文引用的文献

Improved Protein Structure Prediction Using a New Multi-Scale Network and Homologous Templates.利用新的多尺度网络和同源模板改进蛋白质结构预测。

Adv Sci (Weinh). 2021 Dec;8(24):e2102592. doi: 10.1002/advs.202102592. Epub 2021 Oct 31.

When homologous sequences meet structural decoys: Accurate contact prediction by tFold in CASP14-(tFold for CASP14 contact prediction).当同源序列遇到结构诱饵时：通过 tFold 在 CASP14 中的精确接触预测（用于 CASP14 接触预测的 tFold）。

Proteins. 2021 Dec;89(12):1901-1910. doi: 10.1002/prot.26232. Epub 2021 Sep 23.

Improved protein structure prediction by deep learning irrespective of co-evolution information.通过深度学习改进蛋白质结构预测，与共进化信息无关。

Nat Mach Intell. 2021 Jul;3:601-609. doi: 10.1038/s42256-021-00348-5. Epub 2021 May 20.

Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

Deep template-based protein structure prediction.基于模板的深度蛋白质结构预测。

PLoS Comput Biol. 2021 May 3;17(5):e1008954. doi: 10.1371/journal.pcbi.1008954. eCollection 2021 May.

Improved protein structure prediction using potentials from deep learning.利用深度学习势进行蛋白质结构预测的改进。

Nature. 2020 Jan;577(7792):706-710. doi: 10.1038/s41586-019-1923-7. Epub 2020 Jan 15.

Improved protein structure prediction using predicted interresidue orientations.利用预测的残基间取向改进蛋白质结构预测。

Proc Natl Acad Sci U S A. 2020 Jan 21;117(3):1496-1503. doi: 10.1073/pnas.1914677117. Epub 2020 Jan 2.

Modeling aspects of the language of life through transfer-learning protein sequences.通过转移学习蛋白质序列来模拟生命语言的各个方面。

BMC Bioinformatics. 2019 Dec 17;20(1):723. doi: 10.1186/s12859-019-3220-8.

DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins.DeepMSA：构建深度多重序列比对以改进远距离同源蛋白质的接触预测和折叠识别。

Bioinformatics. 2020 Apr 1;36(7):2105-2112. doi: 10.1093/bioinformatics/btz863.

Unified rational protein engineering with sequence-based deep representation learning.基于序列的深度学习表示的统一理性蛋白质工程。

Nat Methods. 2019 Dec;16(12):1315-1322. doi: 10.1038/s41592-019-0598-1. Epub 2019 Oct 21.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用模板和序列嵌入改进蛋白质结构预测。

Improving protein structure prediction using templates and sequence embedding.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献