Wang Sheng, Peng Jian, Ma Jianzhu, Xu Jinbo
Toyota Technological Institute at Chicago, Chicago, IL.
Department of Human Genetics, University of Chicago, Chicago, IL.
Sci Rep. 2016 Jan 11;6:18962. doi: 10.1038/srep18962.
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
蛋白质二级结构(SS)预测对于研究蛋白质的结构和功能至关重要。当仅将序列(概况)信息用作输入特征时,目前最佳的预测器可获得约80%的Q3准确率,这在过去十年中并未得到提高。在此,我们提出用于蛋白质SS预测的深度卷积神经场(DeepCNF)。DeepCNF是条件神经场(CNF)的深度学习扩展,而CNF是条件随机场(CRF)和浅层神经网络的集成。DeepCNF不仅可以通过深度层次结构对复杂的序列-结构关系进行建模,还可以对相邻SS标签之间的相互依赖性进行建模,因此它比CNF更强大。实验结果表明,在CASP和CAMEO测试蛋白质上,DeepCNF分别可获得约84%的Q3准确率、约85%的SOV分数和约72%的Q8准确率,大大优于当前流行的预测器。作为一个通用框架,DeepCNF可用于预测其他蛋白质结构属性,如接触数、无序区域和溶剂可及性。