Liu Jian, Neupane Pawan, Cheng Jianlin
bioRxiv. 2025 Apr 12:2025.03.06.641913. doi: 10.1101/2025.03.06.641913.
With AlphaFold achieving high-accuracy tertiary structure prediction for most single-chain proteins (monomers), the next major challenge in protein structure prediction is accurately modeling multi-chain protein complexes (multimers). We developed MULTICOM4, the latest version of the MULTICOM system, to improve protein complex structure prediction by integrating transformer-based AlphaFold2, diffusion model-based AlphaFold3, and our in-house techniques. These include protein complex stoichiometry prediction, diverse multiple sequence alignment (MSA) generation leveraging both sequence and structure comparison, modeling exception handling, and deep learning-based model quality assessment. MULTICOM4 was blindly evaluated in the 16th community-wide Critical Assessment of Techniques for Protein Structure Prediction (CASP16) in 2024. In Phase 0 of CASP16, where stoichiometry information was unavailable, MULTICOM predictors performed best, with MULTICOM_human achieving a TM-score of 0.752 and a DockQ score of 0.584 for top-ranked predictions on average. In Phase 1 of CASP16, with stoichiometry information provided, MULTICOM_human remained among the top predictors, attaining a TM-score of 0.797 and a DockQ score of 0.558 on average. The CASP16 results demonstrate that integrating complementary AlphaFold2 and 3 with enhanced MSA inputs, comprehensive model ranking, exception handling, and accurate stoichiometry prediction can effectively improve protein complex structure prediction.
随着AlphaFold在大多数单链蛋白质(单体)的高精度三级结构预测方面取得成功,蛋白质结构预测的下一个主要挑战是对多链蛋白质复合物(多聚体)进行精确建模。我们开发了MULTICOM4,这是MULTICOM系统的最新版本,通过整合基于Transformer的AlphaFold2、基于扩散模型的AlphaFold3以及我们的内部技术来改进蛋白质复合物结构预测。这些技术包括蛋白质复合物化学计量预测、利用序列和结构比较生成多样的多序列比对(MSA)、建模异常处理以及基于深度学习的模型质量评估。MULTICOM4在2024年第16届全社区蛋白质结构预测技术关键评估(CASP16)中进行了盲测。在CASP16的第0阶段,化学计量信息不可用时,MULTICOM预测器表现最佳,MULTICOM_human在顶级预测中的平均TM分数为0.752,DockQ分数为0.584。在CASP16的第1阶段,提供了化学计量信息,MULTICOM_human仍然是顶级预测器之一,平均TM分数为0.797,DockQ分数为0.558。CASP16的结果表明,将互补的AlphaFold2和3与增强的MSA输入、全面的模型排名、异常处理和准确的化学计量预测相结合,可以有效地改进蛋白质复合物结构预测。