Suppr
超能文献

通过整合蛋白质语言模型进行单序列蛋白质结构预测。

Single-sequence protein structure prediction by integrating protein language models.

机构信息

MoleculeMind Ltd., Beijing 100084, China.

Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.

出版信息

Proc Natl Acad Sci U S A. 2024 Mar 26;121(13):e2308788121. doi: 10.1073/pnas.2308788121. Epub 2024 Mar 20.

DOI:10.1073/pnas.2308788121

PMID:38507445

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10990103/

Abstract

Protein structure prediction has been greatly improved by deep learning in the past few years. However, the most successful methods rely on multiple sequence alignment (MSA) of the sequence homologs of the protein under prediction. In nature, a protein folds in the absence of its sequence homologs and thus, a MSA-free structure prediction method is desired. Here, we develop a single-sequence-based protein structure prediction method RaptorX-Single by integrating several protein language models and a structure generation module and then study its advantage over MSA-based methods. Our experimental results indicate that in addition to running much faster than MSA-based methods such as AlphaFold2, RaptorX-Single outperforms AlphaFold2 and other MSA-free methods in predicting the structure of antibodies (after fine-tuning on antibody data), proteins of very few sequence homologs, and single mutation effects. By comparing different protein language models, our results show that not only the scale but also the training data of protein language models will impact the performance. RaptorX-Single also compares favorably to MSA-based AlphaFold2 when the protein under prediction has a large number of sequence homologs.

摘要

在过去的几年中，深度学习极大地提高了蛋白质结构预测的能力。然而，最成功的方法依赖于预测蛋白质的序列同源物的多重序列比对（MSA）。在自然界中，蛋白质在没有其序列同源物的情况下折叠，因此需要一种无 MSA 的结构预测方法。在这里，我们通过整合几个蛋白质语言模型和一个结构生成模块，开发了一种基于单序列的蛋白质结构预测方法 RaptorX-Single，然后研究了它相对于基于 MSA 的方法的优势。我们的实验结果表明，除了比基于 MSA 的方法（如 AlphaFold2）运行速度快得多之外，RaptorX-Single 在预测抗体（在抗体数据上进行微调后）、序列同源物非常少的蛋白质和单突变效应的结构方面也优于 AlphaFold2 和其他无 MSA 的方法。通过比较不同的蛋白质语言模型，我们的结果表明，不仅模型的规模，而且训练数据也会影响性能。当预测的蛋白质有大量序列同源物时，RaptorX-Single 与基于 MSA 的 AlphaFold2 相比也具有优势。

相似文献

Single-sequence protein structure prediction by integrating protein language models.

Proc Natl Acad Sci U S A. 2024 Mar 26;121(13):e2308788121. doi: 10.1073/pnas.2308788121. Epub 2024 Mar 20.

Improved structure-related prediction for insufficient homologous proteins using MSA enhancement and pre-trained language model.

Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad217.

Improving protein structure prediction using templates and sequence embedding.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac723.

Integrating deep learning, threading alignments, and a multi-MSA strategy for high-quality protein monomer and complex structure prediction in CASP15.

Proteins. 2023 Dec;91(12):1684-1703. doi: 10.1002/prot.26585. Epub 2023 Aug 31.

Analysis of distance-based protein structure prediction by deep learning in CASP13.

Proteins. 2019 Dec;87(12):1069-1081. doi: 10.1002/prot.25810. Epub 2019 Sep 13.

Improved the heterodimer protein complex prediction with protein language models.

Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad221.

Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data.

Nat Methods. 2024 Feb;21(2):279-289. doi: 10.1038/s41592-023-02130-4. Epub 2024 Jan 2.

Pairing interacting protein sequences using masked language modeling.

Proc Natl Acad Sci U S A. 2024 Jul 2;121(27):e2311887121. doi: 10.1073/pnas.2311887121. Epub 2024 Jun 24.

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.

PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.

Single-sequence protein structure prediction using a language model and deep learning.

Nat Biotechnol. 2022 Nov;40(11):1617-1623. doi: 10.1038/s41587-022-01432-w. Epub 2022 Oct 3.

引用本文的文献

SAGERank: inductive learning of protein-protein interaction from antibody-antigen recognition.

Chem Sci. 2025 Aug 12. doi: 10.1039/d5sc03707g.

Chemosensory Receptors in Vertebrates: Structure and Computational Modeling Insights.

Int J Mol Sci. 2025 Jul 10;26(14):6605. doi: 10.3390/ijms26146605.

designed bright, hyperstable rhodamine binders for fluorescence microscopy.

bioRxiv. 2025 Jun 25:2025.06.24.661379. doi: 10.1101/2025.06.24.661379.

Locality-aware pooling enhances protein language model performance across varied applications.

Bioinformatics. 2025 Jul 1;41(Supplement_1):i217-i226. doi: 10.1093/bioinformatics/btaf178.

De novo design of porphyrin-containing proteins as efficient and stereoselective catalysts.

Science. 2025 May 8;388(6747):665-670. doi: 10.1126/science.adt7268.

Accurate prediction of nucleic acid binding proteins using protein language model.

Bioinform Adv. 2025 Jan 20;5(1):vbaf008. doi: 10.1093/bioadv/vbaf008. eCollection 2025.

Emergence of specific binding and catalysis from a designed generalist binding protein.

bioRxiv. 2025 Mar 19:2025.01.30.635804. doi: 10.1101/2025.01.30.635804.

GDFold2: A fast and parallelizable protein folding environment with freely defined objective functions.

Protein Sci. 2025 Feb;34(2):e70041. doi: 10.1002/pro.70041.

Predicting purification process fit of monoclonal antibodies using machine learning.

MAbs. 2025 Dec;17(1):2439988. doi: 10.1080/19420862.2024.2439988. Epub 2025 Jan 9.

The success rate of processed predicted models in molecular replacement: implications for experimental phasing in the AlphaFold era.

Acta Crystallogr D Struct Biol. 2024 Nov 1;80(Pt 11):766-779. doi: 10.1107/S2059798324009380. Epub 2024 Oct 3.

本文引用的文献

Single-sequence protein structure prediction using supervised transformer protein language models.

Nat Comput Sci. 2022 Dec;2(12):804-814. doi: 10.1038/s43588-022-00373-3. Epub 2022 Dec 19.

Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies.

Nat Commun. 2023 Apr 25;14(1):2389. doi: 10.1038/s41467-023-38063-x.

Using AlphaFold to predict the impact of single mutations on protein stability and function.

PLoS One. 2023 Mar 16;18(3):e0282689. doi: 10.1371/journal.pone.0282689. eCollection 2023.

Evolutionary-scale prediction of atomic-level protein structure with a language model.

Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.

A structural biology community assessment of AlphaFold2 applications.

Nat Struct Mol Biol. 2022 Nov;29(11):1056-1067. doi: 10.1038/s41594-022-00849-w. Epub 2022 Nov 7.

Single-sequence protein structure prediction using a language model and deep learning.

Nat Biotechnol. 2022 Nov;40(11):1617-1623. doi: 10.1038/s41587-022-01432-w. Epub 2022 Oct 3.

Antibody structure prediction using interpretable deep learning.

Patterns (N Y). 2021 Dec 9;3(2):100406. doi: 10.1016/j.patter.2021.100406. eCollection 2022 Feb 11.

Can AlphaFold2 predict the impact of missense mutations on structure?

Nat Struct Mol Biol. 2022 Jan;29(1):1-2. doi: 10.1038/s41594-021-00714-2.

SAbDab in the age of biotherapeutics: updates including SAbDab-nano, the nanobody structure tracker.

Nucleic Acids Res. 2022 Jan 7;50(D1):D1368-D1372. doi: 10.1093/nar/gkab1050.

Improved protein structure prediction by deep learning irrespective of co-evolution information.

Nat Mach Intell. 2021 Jul;3:601-609. doi: 10.1038/s42256-021-00348-5. Epub 2021 May 20.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

通过整合蛋白质语言模型进行单序列蛋白质结构预测。

Single-sequence protein structure prediction by integrating protein language models.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译