Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA.
Department of Computer Science, Saint Louis University, St. Louis, Missouri, USA.
Proteins. 2022 Jan;90(1):58-72. doi: 10.1002/prot.26186. Epub 2021 Jul 27.
Substantial progresses in protein structure prediction have been made by utilizing deep-learning and residue-residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning-based protein inter-residue distance predictor to improve template-free (ab initio) tertiary structure prediction, (b) an enhanced template-based tertiary structure prediction method, and (c) distance-based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter-domain structure prediction. The results demonstrate that the template-free modeling based on deep learning and residue-residue distance prediction can predict the correct topology for almost all template-based modeling targets and a majority of hard targets (template-free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template-free modeling performs better than the template-based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template-free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available at https://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3 and https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.
自 CASP13 以来,利用深度学习和残基残基距离预测,蛋白质结构预测取得了实质性进展。受此启发,我们通过整合三个新组件来改进我们的 CASP14 MULTICOM 蛋白质结构预测系统:(a)一个新的基于深度学习的蛋白质残基间距离预测器,用于改进无模板(从头开始)三级结构预测;(b)一种增强的基于模板的三级结构预测方法;(c)基于距离的模型质量评估方法,由深度学习提供支持。在 2020 年的 CASP14 实验中,MULTICOM 预测器在三级结构预测中排名第 146 位预测器中的第 7 位,在 136 位预测器中的第 3 位,在域间结构预测中排名第 3 位。结果表明,基于深度学习和残基残基距离预测的无模板建模可以预测出几乎所有基于模板建模目标和大多数硬目标(无模板目标或无法识别模板的目标)的正确拓扑结构,这比 CASP13 MULTICOM 预测器有了显著的改进。此外,无模板建模不仅在硬目标上,而且在具有同源模板的目标上,表现都优于基于模板的建模。无模板建模的性能在很大程度上取决于与多重序列比对质量密切相关的距离预测的准确性。结构模型质量评估在可以预测足够多的良好模型的目标上效果很好,但在硬目标仅预测了少量良好模型且模型质量得分分布高度偏斜时,它的表现可能不佳。MULTICOM 可在以下网址获得:https://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3 和 https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0。