a Genome Diversity Center, Institute of Evolution, University of Haifa , Haifa , Israel .
J Biomol Struct Dyn. 2014;32(10):1575-82. doi: 10.1080/07391102.2013.827133. Epub 2013 Aug 22.
This work presents a dynamic artificial neural network methodology, which classifies the proteins into their classes from their sequences alone: the lysosomal membrane protein classes and the various other membranes protein classes. In this paper, neural networks-based lysosomal-associated membrane protein type prediction system is proposed. Different protein sequence representations are fused to extract the features of a protein sequence, which includes seven feature sets; amino acid (AA) composition, sequence length, hydrophobic group, electronic group, sum of hydrophobicity, R-group, and dipeptide composition. To reduce the dimensionality of the large feature vector, we applied the principal component analysis. The probabilistic neural network, generalized regression neural network, and Elman regression neural network (RNN) are used as classifiers and compared with layer recurrent network (LRN), a dynamic network. The dynamic networks have memory, i.e. its output depends not only on the input but the previous outputs also. Thus, the accuracy of LRN classifier among all other artificial neural networks comes out to be the highest. The overall accuracy of jackknife cross-validation is 93.2% for the data-set. These predicted results suggest that the method can be effectively applied to discriminate lysosomal associated membrane proteins from other membrane proteins (Type-I, Outer membrane proteins, GPI-Anchored) and Globular proteins, and it also indicates that the protein sequence representation can better reflect the core feature of membrane proteins than the classical AA composition.
这项工作提出了一种动态人工神经网络方法,该方法可以仅从序列上将蛋白质分类为其所属类别:溶酶体膜蛋白类别和各种其他膜蛋白类别。本文提出了一种基于神经网络的溶酶体相关膜蛋白类型预测系统。融合了不同的蛋白质序列表示形式来提取蛋白质序列的特征,其中包括七个特征集:氨基酸(AA)组成、序列长度、疏水区、电子基团、疏水性总和、R 基团和二肽组成。为了降低大型特征向量的维数,我们应用了主成分分析。概率神经网络、广义回归神经网络和 Elman 回归神经网络(RNN)被用作分类器,并与层递归网络(LRN)进行了比较,LRN 是一种动态网络。动态网络具有记忆功能,即其输出不仅取决于输入,还取决于先前的输出。因此,LRN 分类器在所有其他人工神经网络中的准确性最高。数据集的 Jackknife 交叉验证的总体准确性为 93.2%。这些预测结果表明,该方法可有效区分溶酶体相关膜蛋白和其他膜蛋白(I 型、外膜蛋白、GPI 锚定)和球状蛋白,并且表明蛋白质序列表示形式比经典的 AA 组成更好地反映了膜蛋白的核心特征。