Suppr超能文献

基于模板的深度学习蛋白质结构预测

Template-based prediction of protein structure with deep learning.

作者信息

Zhang Haicang, Shen Yufeng

机构信息

Department of Systems Biology, Columbia University, New York, NY, USA.

Department of Biomedical Informatics, Columbia University, New York, NY, USA.

出版信息

BMC Genomics. 2020 Dec 29;21(Suppl 11):878. doi: 10.1186/s12864-020-07249-8.

Abstract

BACKGROUND

Accurate prediction of protein structure is fundamentally important to understand biological function of proteins. Template-based modeling, including protein threading and homology modeling, is a popular method for protein tertiary structure prediction. However, accurate template-query alignment and template selection are still very challenging, especially for the proteins with only distant homologs available.

RESULTS

We propose a new template-based modelling method called ThreaderAI to improve protein tertiary structure prediction. ThreaderAI formulates the task of aligning query sequence with template as the classical pixel classification problem in computer vision and naturally applies deep residual neural network in prediction. ThreaderAI first employs deep learning to predict residue-residue aligning probability matrix by integrating sequence profile, predicted sequential structural features, and predicted residue-residue contacts, and then builds template-query alignment by applying a dynamic programming algorithm on the probability matrix. We evaluated our methods both in generating accurate template-query alignment and protein threading. Experimental results show that ThreaderAI outperforms currently popular template-based modelling methods HHpred, CNFpred, and the latest contact-assisted method CEthreader, especially on the proteins that do not have close homologs with known structures. In particular, in terms of alignment accuracy measured with TM-score, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 56, 13, and 11%, respectively, on template-query pairs at the similarity of fold level from SCOPe data. And on CASP13's TBM-hard data, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 16, 9 and 8% in terms of TM-score, respectively.

CONCLUSIONS

These results demonstrate that with the help of deep learning, ThreaderAI can significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins.

摘要

背景

准确预测蛋白质结构对于理解蛋白质的生物学功能至关重要。基于模板的建模方法,包括蛋白质穿线法和同源建模法,是预测蛋白质三级结构的常用方法。然而,准确的模板-查询序列比对和模板选择仍然极具挑战性,尤其是对于那些仅有远源同源物的蛋白质。

结果

我们提出了一种名为ThreaderAI的基于模板的新建模方法,以改进蛋白质三级结构预测。ThreaderAI将查询序列与模板的比对任务表述为计算机视觉中的经典像素分类问题,并在预测中自然地应用深度残差神经网络。ThreaderAI首先通过整合序列概况、预测的序列结构特征和预测的残基-残基接触来利用深度学习预测残基-残基比对概率矩阵,然后通过对概率矩阵应用动态规划算法来构建模板-查询序列比对。我们在生成准确的模板-查询序列比对和蛋白质穿线方面对我们的方法进行了评估。实验结果表明,ThreaderAI优于当前流行的基于模板的建模方法HHpred、CNFpred以及最新的接触辅助方法CEthreader,尤其是在那些没有与已知结构的紧密同源物的蛋白质上。特别是,在用TM分数衡量的比对准确性方面,在来自SCOPe数据的折叠水平相似性的模板-查询序列对上,ThreaderAI分别比HHpred、CNFpred和CEthreader高出56%、13%和11%。在CASP13的TBM-hard数据上,在TM分数方面,ThreaderAI分别比HHpred、CNFpred和CEthreader高出16%、9%和8%。

结论

这些结果表明,借助深度学习,ThreaderAI可以显著提高基于模板的结构预测的准确性,尤其是对于远源同源蛋白质。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4941/7771081/8f3aa607868e/12864_2020_7249_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验