Kumar Ravindra, Kumari Bandana, Kumar Manish
Department of Biophysics, University of Delhi South Campus, New Delhi, India.
Current affiliation: Newe-Ya'ar Research Center, Agricultural Research Organization, Ramat Yishay, Israel.
PeerJ. 2017 Sep 4;5:e3561. doi: 10.7717/peerj.3561. eCollection 2017.
The endoplasmic reticulum plays an important role in many cellular processes, which includes protein synthesis, folding and post-translational processing of newly synthesized proteins. It is also the site for quality control of misfolded proteins and entry point of extracellular proteins to the secretory pathway. Hence at any given point of time, endoplasmic reticulum contains two different cohorts of proteins, (i) proteins involved in endoplasmic reticulum-specific function, which reside in the lumen of the endoplasmic reticulum, called as endoplasmic reticulum resident proteins and (ii) proteins which are in process of moving to the extracellular space. Thus, endoplasmic reticulum resident proteins must somehow be distinguished from newly synthesized secretory proteins, which pass through the endoplasmic reticulum on their way out of the cell. Approximately only 50% of the proteins used in this study as training data had endoplasmic reticulum retention signal, which shows that these signals are not essentially present in all endoplasmic reticulum resident proteins. This also strongly indicates the role of additional factors in retention of endoplasmic reticulum-specific proteins inside the endoplasmic reticulum.
This is a support vector machine based method, where we had used different forms of protein features as inputs for support vector machine to develop the prediction models. During training approach of cross-validation was used. Maximum performance was obtained with a combination of amino acid compositions of different part of proteins.
In this study, we have reported a novel support vector machine based method for predicting endoplasmic reticulum resident proteins, named as ERPred. During training we achieved a maximum accuracy of 81.42% with approach of cross-validation. When evaluated on independent dataset, ERPred did prediction with sensitivity of 72.31% and specificity of 83.69%. We have also annotated six different proteomes to predict the candidate endoplasmic reticulum resident proteins in them. A webserver, ERPred, was developed to make the method available to the scientific community, which can be accessed at http://proteininformatics.org/mkumar/erpred/index.html.
We found that out of 124 proteins of the training dataset, only 66 proteins had endoplasmic reticulum retention signals, which shows that these signals are not an absolute necessity for endoplasmic reticulum resident proteins to remain inside the endoplasmic reticulum. This observation also strongly indicates the role of additional factors in retention of proteins inside the endoplasmic reticulum. Our proposed predictor, ERPred, is a signal independent tool. It is tuned for the prediction of endoplasmic reticulum resident proteins, even if the query protein does not contain specific ER-retention signal.
内质网在许多细胞过程中发挥着重要作用,包括蛋白质合成、新合成蛋白质的折叠和翻译后加工。它也是错误折叠蛋白质质量控制的场所以及细胞外蛋白质进入分泌途径的入口。因此,在任何给定时间点,内质网包含两类不同的蛋白质群体:(i)参与内质网特定功能的蛋白质,它们位于内质网腔中,称为内质网驻留蛋白;(ii)正在向细胞外空间移动的蛋白质。因此,内质网驻留蛋白必须以某种方式与新合成的分泌蛋白区分开来,后者在离开细胞的途中穿过内质网。在本研究中用作训练数据的蛋白质中,大约只有50%具有内质网保留信号,这表明这些信号并非所有内质网驻留蛋白都必需具备。这也强烈表明了其他因素在内质网特异性蛋白保留在内质网中的作用。
这是一种基于支持向量机的方法,我们使用了不同形式的蛋白质特征作为支持向量机的输入来开发预测模型。在训练过程中采用了交叉验证方法。通过蛋白质不同部分的氨基酸组成相结合获得了最佳性能。
在本研究中,我们报告了一种基于支持向量机的预测内质网驻留蛋白的新方法,名为ERPred。在训练过程中,通过交叉验证方法我们获得了81.42%的最高准确率。在独立数据集上进行评估时,ERPred的预测灵敏度为72.31%,特异性为83.69%。我们还注释了六个不同的蛋白质组以预测其中的候选内质网驻留蛋白。开发了一个网络服务器ERPred,以使科学界能够使用该方法,可通过http://proteininformatics.org/mkumar/erpred/index.html访问。
我们发现训练数据集中的124种蛋白质中,只有66种蛋白质具有内质网保留信号,这表明这些信号对于内质网驻留蛋白保留在内质网中并非绝对必要。这一观察结果也强烈表明了其他因素在蛋白质保留在内质网中的作用。我们提出的预测器ERPred是一种不依赖信号的工具。它经过调整用于预测内质网驻留蛋白,即使查询蛋白不包含特定的内质网保留信号。