Jo Taeho, Hou Jie, Eickholt Jesse, Cheng Jianlin
Department of Computer Science, University of Missouri, Columbia, MO 65211, USA.
Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA.
Sci Rep. 2015 Dec 4;5:17573. doi: 10.1038/srep17573.
For accurate recognition of protein folds, a deep learning network method (DN-Fold) was developed to predict if a given query-template protein pair belongs to the same structural fold. The input used stemmed from the protein sequence and structural features extracted from the protein pair. We evaluated the performance of DN-Fold along with 18 different methods on Lindahl's benchmark dataset and on a large benchmark set extracted from SCOP 1.75 consisting of about one million protein pairs, at three different levels of fold recognition (i.e., protein family, superfamily, and fold) depending on the evolutionary distance between protein sequences. The correct recognition rate of ensembled DN-Fold for Top 1 predictions is 84.5%, 61.5%, and 33.6% and for Top 5 is 91.2%, 76.5%, and 60.7% at family, superfamily, and fold levels, respectively. We also evaluated the performance of single DN-Fold (DN-FoldS), which showed the comparable results at the level of family and superfamily, compared to ensemble DN-Fold. Finally, we extended the binary classification problem of fold recognition to real-value regression task, which also show a promising performance. DN-Fold is freely available through a web server at http://iris.rnet.missouri.edu/dnfold.
为了准确识别蛋白质折叠,开发了一种深度学习网络方法(DN-Fold),以预测给定的查询模板蛋白质对是否属于相同的结构折叠。所使用的输入源自蛋白质序列以及从蛋白质对中提取的结构特征。我们在林达尔基准数据集以及从SCOP 1.75提取的包含约一百万个蛋白质对的大型基准集上,根据蛋白质序列之间的进化距离,在三种不同的折叠识别水平(即蛋白质家族、超家族和折叠)下,评估了DN-Fold以及18种不同方法的性能。在家族、超家族和折叠水平上,集成DN-Fold的Top 1预测正确识别率分别为84.5%、61.5%和33.6%,Top 5预测正确识别率分别为91.2%、76.5%和60.7%。我们还评估了单个DN-Fold(DN-FoldS)的性能,与集成DN-Fold相比,它在家族和超家族水平上显示出可比的结果。最后,我们将折叠识别的二元分类问题扩展到实值回归任务,其也表现出了有前景的性能。可通过网页服务器http://iris.rnet.missouri.edu/dnfold免费获取DN-Fold。