Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA.
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China.
Proteins. 2021 Dec;89(12):1734-1751. doi: 10.1002/prot.26193. Epub 2021 Aug 7.
In this article, we report 3D structure prediction results by two of our best server groups ("Zhang-Server" and "QUARK") in CASP14. These two servers were built based on the D-I-TASSER and D-QUARK algorithms, which integrated four newly developed components into the classical protein folding pipelines, I-TASSER and QUARK, respectively. The new components include: (a) a new multiple sequence alignment (MSA) collection tool, DeepMSA2, which is extended from the DeepMSA program; (b) a contact-based domain boundary prediction algorithm, FUpred, to detect protein domain boundaries; (c) a residual convolutional neural network-based method, DeepPotential, to predict multiple spatial restraints by co-evolutionary features derived from the MSA; and (d) optimized spatial restraint energy potentials to guide the structure assembly simulations. For 37 FM targets, the average TM-scores of the first models produced by D-I-TASSER and D-QUARK were 96% and 112% higher than those constructed by I-TASSER and QUARK, respectively. The data analysis indicates noticeable improvements produced by each of the four new components, especially for the newly added spatial restraints from DeepPotential and the well-tuned force field that combines spatial restraints, threading templates, and generic knowledge-based potentials. However, challenges still exist in the current pipelines. These include difficulties in modeling multi-domain proteins due to low accuracy in inter-domain distance prediction and modeling protein domains from oligomer complexes, as the co-evolutionary analysis cannot distinguish inter-chain and intra-chain distances. Specifically tuning the deep learning-based predictors for multi-domain targets and protein complexes may be helpful to address these issues.
在本文中,我们报告了我们两个最佳服务器组(“Zhang-Server”和“QUARK”)在 CASP14 中的 3D 结构预测结果。这两个服务器是基于 D-I-TASSER 和 D-QUARK 算法构建的,它们分别将四个新开发的组件集成到经典的蛋白质折叠管道中,即 I-TASSER 和 QUARK。新组件包括:(a)一个新的多重序列比对(MSA)收集工具 DeepMSA2,它是从 DeepMSA 程序扩展而来的;(b)一种基于接触的域边界预测算法 FUpred,用于检测蛋白质域边界;(c)一种基于残差卷积神经网络的方法 DeepPotential,用于通过从 MSA 中提取的共进化特征预测多个空间约束;(d)优化的空间约束能势,以指导结构组装模拟。对于 37 个 FM 靶标,D-I-TASSER 和 D-QUARK 产生的第一个模型的平均 TM 评分分别比 I-TASSER 和 QUARK 构建的模型高 96%和 112%。数据分析表明,四个新组件中的每一个都产生了显著的改进,特别是 DeepPotential 提供的新添加的空间约束和结合空间约束、串联模板和通用基于知识的势能的优化力场。然而,当前管道仍然存在挑战。这些挑战包括由于跨域距离预测的准确性低和建模寡聚复合物中的蛋白质域的准确性低,在建模多域蛋白时存在困难,因为共进化分析无法区分链间和链内距离。针对多域靶标和蛋白质复合物专门调整基于深度学习的预测器可能有助于解决这些问题。