Yan Renxiang, Wang Xiaofeng, Huang Lanqing, Yan Feidi, Xue Xiaoyu, Cai Weiwen
Institute of Applied Genomics, School of Biological Sciences and Engineering, Fuzhou University, Fuzhou 350108, China.
College of Mathematics and Computer Sciences, Shanxi Normal University, Linfen, 041004, China.
Sci Rep. 2015 Jun 24;5:11586. doi: 10.1038/srep11586.
Protein three-dimensional (3D) structures provide insightful information in many fields of biology. One-dimensional properties derived from 3D structures such as secondary structure, residue solvent accessibility, residue depth and backbone torsion angles are helpful to protein function prediction, fold recognition and ab initio folding. Here, we predict various structural features with the assistance of neural network learning. Based on an independent test dataset, protein secondary structure prediction generates an overall Q3 accuracy of ~80%. Meanwhile, the prediction of relative solvent accessibility obtains the highest mean absolute error of 0.164, and prediction of residue depth achieves the lowest mean absolute error of 0.062. We further improve the outer membrane protein identification by including the predicted structural features in a scoring function using a simple profile-to-profile alignment. The results demonstrate that the accuracy of outer membrane protein identification can be improved by ~3% at a 1% false positive level when structural features are incorporated. Finally, our methods are available as two convenient and easy-to-use programs. One is PSSM-2-Features for predicting secondary structure, relative solvent accessibility, residue depth and backbone torsion angles, the other is PPA-OMP for identifying outer membrane proteins from proteomes.
蛋白质三维(3D)结构在生物学的许多领域提供了有见地的信息。从3D结构派生的一维属性,如二级结构、残基溶剂可及性、残基深度和主链扭转角,有助于蛋白质功能预测、折叠识别和从头折叠。在这里,我们借助神经网络学习来预测各种结构特征。基于一个独立测试数据集,蛋白质二级结构预测产生的总体Q3准确率约为80%。同时,相对溶剂可及性的预测获得了最高平均绝对误差0.164,残基深度的预测实现了最低平均绝对误差0.062。我们通过在评分函数中使用简单的轮廓到轮廓比对纳入预测的结构特征,进一步改进了外膜蛋白识别。结果表明,当纳入结构特征时,在1%的假阳性水平下,外膜蛋白识别的准确率可提高约3%。最后,我们的方法以两个方便易用的程序提供。一个是用于预测二级结构、相对溶剂可及性、残基深度和主链扭转角的PSSM - 2 - Features,另一个是用于从蛋白质组中识别外膜蛋白的PPA - OMP。