Hayat Maqsood, Khan Asifullah
Department of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences, Nilore, Islamabad, Pakistan.
J Theor Biol. 2011 Feb 21;271(1):10-7. doi: 10.1016/j.jtbi.2010.11.017. Epub 2010 Nov 24.
Membrane proteins are vital type of proteins that serve as channels, receptors, and energy transducers in a cell. Prediction of membrane protein types is an important research area in bioinformatics. Knowledge of membrane protein types provides some valuable information for predicting novel example of the membrane protein types. However, classification of membrane protein types can be both time consuming and susceptible to errors due to the inherent similarity of membrane protein types. In this paper, neural networks based membrane protein type prediction system is proposed. Composite protein sequence representation (CPSR) is used to extract the features of a protein sequence, which includes seven feature sets; amino acid composition, sequence length, 2 gram exchange group frequency, hydrophobic group, electronic group, sum of hydrophobicity, and R-group. Principal component analysis is then employed to reduce the dimensionality of the feature vector. The probabilistic neural network (PNN), generalized regression neural network, and support vector machine (SVM) are used as classifiers. A high success rate of 86.01% is obtained using SVM for the jackknife test. In case of independent dataset test, PNN yields the highest accuracy of 95.73%. These classifiers exhibit improved performance using other performance measures such as sensitivity, specificity, Mathew's correlation coefficient, and F-measure. The experimental results show that the prediction performance of the proposed scheme for classifying membrane protein types is the best reported, so far. This performance improvement may largely be credited to the learning capabilities of neural networks and the composite feature extraction strategy, which exploits seven different properties of protein sequences. The proposed Mem-Predictor can be accessed at http://111.68.99.218/Mem-Predictor.
膜蛋白是一类重要的蛋白质,在细胞中充当通道、受体和能量转换器。膜蛋白类型的预测是生物信息学中的一个重要研究领域。膜蛋白类型的知识为预测膜蛋白类型的新实例提供了一些有价值的信息。然而,由于膜蛋白类型的内在相似性,膜蛋白类型的分类既耗时又容易出错。本文提出了基于神经网络的膜蛋白类型预测系统。复合蛋白质序列表示(CPSR)用于提取蛋白质序列的特征,它包括七个特征集:氨基酸组成、序列长度、二联体交换基团频率、疏水基团、电子基团、疏水性总和以及R基团。然后采用主成分分析来降低特征向量的维度。概率神经网络(PNN)、广义回归神经网络和支持向量机(SVM)用作分类器。使用SVM进行留一法检验获得了86.01%的高成功率。在独立数据集测试中,PNN的准确率最高,为95.73%。这些分类器在使用其他性能指标(如灵敏度、特异性、马修斯相关系数和F值)时表现出更好的性能。实验结果表明,到目前为止,所提出的膜蛋白类型分类方案的预测性能是已报道的最佳性能。这种性能的提高很大程度上归功于神经网络的学习能力和复合特征提取策略,该策略利用了蛋白质序列的七种不同特性。所提出的Mem-Predictor可通过http://111.68.99.218/Mem-Predictor访问。