Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan.
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China.
Proteins. 2019 Dec;87(12):1149-1164. doi: 10.1002/prot.25792. Epub 2019 Aug 14.
We report the results of two fully automated structure prediction pipelines, "Zhang-Server" and "QUARK", in CASP13. The pipelines were built upon the C-I-TASSER and C-QUARK programs, which in turn are based on I-TASSER and QUARK but with three new modules: (a) a novel multiple sequence alignment (MSA) generation protocol to construct deep sequence-profiles for contact prediction; (b) an improved meta-method, NeBcon, which combines multiple contact predictors, including ResPRE that predicts contact-maps by coupling precision-matrices with deep residual convolutional neural-networks; and (c) an optimized contact potential to guide structure assembly simulations. For 50 CASP13 FM domains that lacked homologous templates, average TM-scores of the first models produced by C-I-TASSER and C-QUARK were 28% and 56% higher than those constructed by I-TASSER and QUARK, respectively. For the first time, contact-map predictions demonstrated usefulness on TBM domains with close homologous templates, where TM-scores of C-I-TASSER models were significantly higher than those of I-TASSER models with a P-value <.05. Detailed data analyses showed that the success of C-I-TASSER and C-QUARK was mainly due to the increased accuracy of deep-learning-based contact-maps, as well as the careful balance between sequence-based contact restraints, threading templates, and generic knowledge-based potentials. Nevertheless, challenges still remain for predicting quaternary structure of multi-domain proteins, due to the difficulties in domain partitioning and domain reassembly. In addition, contact prediction in terminal regions was often unsatisfactory due to the sparsity of MSAs. Development of new contact-based domain partitioning and assembly methods and training contact models on sparse MSAs may help address these issues.
我们报告了在 CASP13 中两个完全自动化的结构预测管道“Zhang-Server”和“QUARK”的结果。这些管道是基于 C-I-TASSER 和 C-QUARK 程序构建的,而 C-I-TASSER 和 C-QUARK 则是基于 I-TASSER 和 QUARK,但增加了三个新模块:(a)一种新的多重序列比对(MSA)生成协议,用于构建用于接触预测的深度序列轮廓;(b)一种改进的元方法 NeBcon,它结合了多种接触预测器,包括 ResPRE,该方法通过将精度矩阵与深度残差卷积神经网络相结合来预测接触图;和(c)一种优化的接触势能,以指导结构组装模拟。对于缺少同源模板的 50 个 CASP13 FM 结构域,C-I-TASSER 和 C-QUARK 生成的第一个模型的平均 TM 评分比 I-TASSER 和 QUARK 分别高出 28%和 56%。这是首次在具有紧密同源模板的 TBM 结构域中证明接触图预测的有用性,其中 C-I-TASSER 模型的 TM 评分明显高于 I-TASSER 模型,具有统计学意义(P 值<.05)。详细的数据分析表明,C-I-TASSER 和 C-QUARK 的成功主要归因于基于深度学习的接触图的准确性提高,以及序列接触约束、线程模板和通用基于知识的势能之间的精心平衡。然而,由于在预测多结构域蛋白质的四级结构时,在结构域划分和结构域重组方面仍然存在挑战,因此仍然存在挑战。此外,由于 MSA 的稀疏性,末端区域的接触预测往往不尽人意。开发新的基于接触的结构域划分和组装方法,并在稀疏 MSA 上训练接触模型,可能有助于解决这些问题。