结合成对结构相似性和深度学习界面接触预测来评估CASP15中蛋白质复合物模型的准确性。

Combining pairwise structural similarity and deep learning interface contact prediction to estimate protein complex model accuracy in CASP15.

作者信息

Roy Raj S, Liu Jian, Giri Nabin, Guo Zhiye, Cheng Jianlin

机构信息

Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA.

出版信息

bioRxiv. 2023 Mar 12:2023.03.08.531814. doi: 10.1101/2023.03.08.531814.

DOI:10.1101/2023.03.08.531814

PMID:36945536

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10028888/

Abstract

Estimating the accuracy of quaternary structural models of protein complexes and assemblies (EMA) is important for predicting quaternary structures and applying them to studying protein function and interaction. The pairwise similarity between structural models is proven useful for estimating the quality of protein tertiary structural models, but it has been rarely applied to predicting the quality of quaternary structural models. Moreover, the pairwise similarity approach often fails when many structural models are of low quality and similar to each other. To address the gap, we developed a hybrid method (MULTICOM_qa) combining a pairwise similarity score (PSS) and an interface contact probability score (ICPS) based on the deep learning inter-chain contact prediction for estimating protein complex model accuracy. It blindly participated in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 and ranked first out of 24 predictors in estimating the global accuracy of assembly models. The average per-target correlation coefficient between the model quality scores predicted by MULTICOM_qa and the true quality scores of the models of CASP15 assembly targets is 0.66. The average per-target ranking loss in using the predicted quality scores to rank the models is 0.14. It was able to select good models for most targets. Moreover, several key factors (i.e., target difficulty, model sampling difficulty, skewness of model quality, and similarity between good/bad models) for EMA are identified and analayzed. The results demonstrate that combining the multi-model method (PSS) with the complementary single-model method (ICPS) is a promising approach to EMA. The source code of MULTICOM_qa is available at https://github.com/BioinfoMachineLearning/MULTICOM_qa .

摘要

评估蛋白质复合物和组装体四级结构模型的准确性（EMA）对于预测四级结构并将其应用于研究蛋白质功能和相互作用非常重要。结构模型之间的成对相似性已被证明可用于评估蛋白质三级结构模型的质量，但很少应用于预测四级结构模型的质量。此外，当成对相似性方法应用于许多低质量且彼此相似的结构模型时，往往会失效。为了弥补这一差距，我们开发了一种混合方法（MULTICOM_qa），该方法结合了成对相似性分数（PSS）和基于深度学习链间接触预测的界面接触概率分数（ICPS），用于评估蛋白质复合物模型的准确性。它在2022年盲目参加了第15届蛋白质结构预测技术关键评估（CASP15），在评估组装模型的全局准确性方面，在24个预测器中排名第一。MULTICOM_qa预测的模型质量分数与CASP15组装目标模型的真实质量分数之间的平均每个目标相关系数为0.66。使用预测的质量分数对模型进行排名时，平均每个目标的排名损失为0.14。它能够为大多数目标选择良好的模型。此外，还识别并分析了EMA的几个关键因素（即目标难度、模型采样难度、模型质量的偏度以及好/坏模型之间的相似性）。结果表明，将多模型方法（PSS）与互补的单模型方法（ICPS）相结合是一种有前途的EMA方法。MULTICOM_qa的源代码可在https://github.com/BioinfoMachineLearning/MULTICOM_qa获取。