Suppr超能文献

利用模板和序列嵌入改进蛋白质结构预测。

Improving protein structure prediction using templates and sequence embedding.

机构信息

Institute of Computing Technology, Chinese Academy of Sciences, Beijing 626011, China.

Toyota Technological Institute at Chicago, Chicago, IL 60637, USA.

出版信息

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac723.

Abstract

MOTIVATION

Protein structure prediction has been greatly improved by deep learning, but the contribution of different information is yet to be fully understood. This article studies the impacts of two kinds of information for structure prediction: template and multiple sequence alignment (MSA) embedding. Templates have been used by some methods before, such as AlphaFold2, RoseTTAFold and RaptorX. AlphaFold2 and RosetTTAFold only used templates detected by HHsearch, which may not perform very well on some targets. In addition, sequence embedding generated by pre-trained protein language models has not been fully explored for structure prediction. In this article, we study the impact of templates (including the number of templates, the template quality and how the templates are generated) on protein structure prediction accuracy, especially when the templates are detected by methods other than HHsearch. We also study the impact of sequence embedding (generated by MSATransformer and ESM-1b) on structure prediction.

RESULTS

We have implemented a deep learning method for protein structure prediction that may take templates and MSA embedding as extra inputs. We study the contribution of templates and MSA embedding to structure prediction accuracy. Our experimental results show that templates can improve structure prediction on 71 of 110 CASP13 (13th Critical Assessment of Structure Prediction) targets and 47 of 91 CASP14 targets, and templates are particularly useful for targets with similar templates. MSA embedding can improve structure prediction on 63 of 91 CASP14 (14th Critical Assessment of Structure Prediction) targets and 87 of 183 CAMEO targets and is particularly useful for proteins with shallow MSAs. When both templates and MSA embedding are used, our method can predict correct folds (TMscore > 0.5) for 16 of 23 CASP14 FM targets and 14 of 18 Continuous Automated Model Evaluation (CAMEO) targets, outperforming RoseTTAFold by 5% and 7%, respectively.

AVAILABILITY AND IMPLEMENTATION

Available at https://github.com/xluo233/RaptorXFold.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

深度学习极大地提高了蛋白质结构预测的性能,但不同信息的贡献仍有待充分理解。本文研究了两种结构预测信息:模板和多重序列比对(MSA)嵌入的影响。一些方法(如 AlphaFold2、RoseTTAFold 和 RaptorX)之前已经使用了模板。AlphaFold2 和 RosetTTAFold 仅使用 HHsearch 检测到的模板,这可能在某些目标上表现不佳。此外,基于预训练的蛋白质语言模型生成的序列嵌入尚未被充分探索用于结构预测。本文研究了模板(包括模板数量、模板质量以及模板生成方式)对蛋白质结构预测准确性的影响,特别是当模板不是由 HHsearch 检测到的情况下。我们还研究了序列嵌入(由 MSATransformer 和 ESM-1b 生成)对结构预测的影响。

结果

我们实现了一种深度学习方法,该方法可以将模板和 MSA 嵌入作为额外输入用于蛋白质结构预测。我们研究了模板和 MSA 嵌入对结构预测准确性的贡献。实验结果表明,模板可以提高 110 个 CASP13(第 13 届蛋白质结构预测关键评估)目标中的 71 个和 91 个 CASP14 目标中的 47 个的结构预测准确性,对于具有相似模板的目标,模板尤其有用。MSA 嵌入可以提高 91 个 CASP14(第 14 届蛋白质结构预测关键评估)目标中的 63 个和 183 个 CAMEO 目标中的 87 个的结构预测准确性,对于 MSA 较浅的蛋白质尤其有用。当同时使用模板和 MSA 嵌入时,我们的方法可以预测 23 个 CASP14 FM 目标中的 16 个和 18 个连续自动模型评估(CAMEO)目标中的 14 个正确折叠(TMscore>0.5),比 RoseTTAFold 分别提高了 5%和 7%。

可用性和实现

可在 https://github.com/xluo233/RaptorXFold 上获得。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc8b/9805584/34740a25da6f/btac723f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验