MoleculeMind Ltd., Beijing 100084, China.
Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.
Proc Natl Acad Sci U S A. 2024 Mar 26;121(13):e2308788121. doi: 10.1073/pnas.2308788121. Epub 2024 Mar 20.
Protein structure prediction has been greatly improved by deep learning in the past few years. However, the most successful methods rely on multiple sequence alignment (MSA) of the sequence homologs of the protein under prediction. In nature, a protein folds in the absence of its sequence homologs and thus, a MSA-free structure prediction method is desired. Here, we develop a single-sequence-based protein structure prediction method RaptorX-Single by integrating several protein language models and a structure generation module and then study its advantage over MSA-based methods. Our experimental results indicate that in addition to running much faster than MSA-based methods such as AlphaFold2, RaptorX-Single outperforms AlphaFold2 and other MSA-free methods in predicting the structure of antibodies (after fine-tuning on antibody data), proteins of very few sequence homologs, and single mutation effects. By comparing different protein language models, our results show that not only the scale but also the training data of protein language models will impact the performance. RaptorX-Single also compares favorably to MSA-based AlphaFold2 when the protein under prediction has a large number of sequence homologs.
在过去的几年中,深度学习极大地提高了蛋白质结构预测的能力。然而,最成功的方法依赖于预测蛋白质的序列同源物的多重序列比对(MSA)。在自然界中,蛋白质在没有其序列同源物的情况下折叠,因此需要一种无 MSA 的结构预测方法。在这里,我们通过整合几个蛋白质语言模型和一个结构生成模块,开发了一种基于单序列的蛋白质结构预测方法 RaptorX-Single,然后研究了它相对于基于 MSA 的方法的优势。我们的实验结果表明,除了比基于 MSA 的方法(如 AlphaFold2)运行速度快得多之外,RaptorX-Single 在预测抗体(在抗体数据上进行微调后)、序列同源物非常少的蛋白质和单突变效应的结构方面也优于 AlphaFold2 和其他无 MSA 的方法。通过比较不同的蛋白质语言模型,我们的结果表明,不仅模型的规模,而且训练数据也会影响性能。当预测的蛋白质有大量序列同源物时,RaptorX-Single 与基于 MSA 的 AlphaFold2 相比也具有优势。