Liu Jian, Guo Zhiye, Wu Tianqi, Roy Raj S, Quadir Farhan, Chen Chen, Cheng Jianlin
Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA.
bioRxiv. 2023 May 18:2023.05.16.541055. doi: 10.1101/2023.05.16.541055.
AlphaFold-Multimer has emerged as the state-of-the-art tool for predicting the quaternary structure of protein complexes (assemblies or multimers) since its release in 2021. To further enhance the AlphaFold-Multimer-based complex structure prediction, we developed a new quaternary structure prediction system (MULTICOM) to improve the input fed to AlphaFold-Multimer and evaluate and refine the outputs generated by AlphaFold2-Multimer. Specifically, MULTICOM samples diverse multiple sequence alignments (MSAs) and templates for AlphaFold-Multimer to generate structural models by using both traditional alignments and new Foldseek-based alignments, ranks structural models through multiple complementary metrics, and refines the structural models via a Foldseek structure alignment-based refinement method. The MULTICOM system with different implementations was blindly tested in the assembly structure prediction in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 as both server and human predictors. Our server (MULTICOM_qa) ranked 3 among 26 CASP15 server predictors and our human predictor (MULTICOM_human) ranked 7 among 87 CASP15 server and human predictors. The average TM-score of the first models predicted by MULTICOM_qa for CASP15 assembly targets is ~0.76, 5.3% higher than ~0.72 of the standard AlphaFold-Multimer. The average TM-score of the best of top 5 models predicted by MULTICOM_qa is ~0.80, about 8% higher than ~0.74 of the standard AlphaFold-Multimer. Moreover, the novel Foldseek Structure Alignment-based Model Generation (FSAMG) method based on AlphaFold-Multimer outperforms the widely used sequence alignment-based model generation. The source code of MULTICOM is available at: https://github.com/BioinfoMachineLearning/MULTICOM3.
自2021年发布以来,AlphaFold-Multimer已成为预测蛋白质复合物(组装体或多聚体)四级结构的最先进工具。为了进一步增强基于AlphaFold-Multimer的复合物结构预测,我们开发了一种新的四级结构预测系统(MULTICOM),以改进输入到AlphaFold-Multimer中的内容,并评估和优化由AlphaFold2-Multimer生成的输出。具体而言,MULTICOM为AlphaFold-Multimer采样多样的多序列比对(MSA)和模板,通过使用传统比对和基于新的Foldseek的比对来生成结构模型,通过多个互补指标对结构模型进行排名,并通过基于Foldseek结构比对的优化方法对结构模型进行优化。具有不同实现方式的MULTICOM系统在2022年第15届蛋白质结构预测技术关键评估(CASP15)的组装结构预测中作为服务器预测器和人类预测器进行了盲测。我们的服务器(MULTICOM_qa)在26个CASP15服务器预测器中排名第3,我们的人类预测器(MULTICOM_human)在87个CASP15服务器和人类预测器中排名第7。MULTICOM_qa为CASP15组装目标预测的首个模型的平均TM分数约为0.76,比标准AlphaFold-Multimer的约0.72高5.3%。MULTICOM_qa预测的前5个模型中最佳模型的平均TM分数约为0.80,比标准AlphaFold-Multimer的约0.74高约8%。此外,基于AlphaFold-Multimer的新型基于Foldseek结构比对的模型生成(FSAMG)方法优于广泛使用的基于序列比对的模型生成方法。MULTICOM的源代码可在以下网址获取:https://github.com/BioinfoMachineLearning/MULTICOM3 。