• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用片段氨基酸组成和支持向量机预测内质网驻留蛋白

Prediction of endoplasmic reticulum resident proteins using fragmented amino acid composition and support vector machine.

作者信息

Kumar Ravindra, Kumari Bandana, Kumar Manish

机构信息

Department of Biophysics, University of Delhi South Campus, New Delhi, India.

Current affiliation: Newe-Ya'ar Research Center, Agricultural Research Organization, Ramat Yishay, Israel.

出版信息

PeerJ. 2017 Sep 4;5:e3561. doi: 10.7717/peerj.3561. eCollection 2017.

DOI:10.7717/peerj.3561
PMID:28890846
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5588793/
Abstract

BACKGROUND

The endoplasmic reticulum plays an important role in many cellular processes, which includes protein synthesis, folding and post-translational processing of newly synthesized proteins. It is also the site for quality control of misfolded proteins and entry point of extracellular proteins to the secretory pathway. Hence at any given point of time, endoplasmic reticulum contains two different cohorts of proteins, (i) proteins involved in endoplasmic reticulum-specific function, which reside in the lumen of the endoplasmic reticulum, called as endoplasmic reticulum resident proteins and (ii) proteins which are in process of moving to the extracellular space. Thus, endoplasmic reticulum resident proteins must somehow be distinguished from newly synthesized secretory proteins, which pass through the endoplasmic reticulum on their way out of the cell. Approximately only 50% of the proteins used in this study as training data had endoplasmic reticulum retention signal, which shows that these signals are not essentially present in all endoplasmic reticulum resident proteins. This also strongly indicates the role of additional factors in retention of endoplasmic reticulum-specific proteins inside the endoplasmic reticulum.

METHODS

This is a support vector machine based method, where we had used different forms of protein features as inputs for support vector machine to develop the prediction models. During training approach of cross-validation was used. Maximum performance was obtained with a combination of amino acid compositions of different part of proteins.

RESULTS

In this study, we have reported a novel support vector machine based method for predicting endoplasmic reticulum resident proteins, named as ERPred. During training we achieved a maximum accuracy of 81.42% with approach of cross-validation. When evaluated on independent dataset, ERPred did prediction with sensitivity of 72.31% and specificity of 83.69%. We have also annotated six different proteomes to predict the candidate endoplasmic reticulum resident proteins in them. A webserver, ERPred, was developed to make the method available to the scientific community, which can be accessed at http://proteininformatics.org/mkumar/erpred/index.html.

DISCUSSION

We found that out of 124 proteins of the training dataset, only 66 proteins had endoplasmic reticulum retention signals, which shows that these signals are not an absolute necessity for endoplasmic reticulum resident proteins to remain inside the endoplasmic reticulum. This observation also strongly indicates the role of additional factors in retention of proteins inside the endoplasmic reticulum. Our proposed predictor, ERPred, is a signal independent tool. It is tuned for the prediction of endoplasmic reticulum resident proteins, even if the query protein does not contain specific ER-retention signal.

摘要

背景

内质网在许多细胞过程中发挥着重要作用,包括蛋白质合成、新合成蛋白质的折叠和翻译后加工。它也是错误折叠蛋白质质量控制的场所以及细胞外蛋白质进入分泌途径的入口。因此,在任何给定时间点,内质网包含两类不同的蛋白质群体:(i)参与内质网特定功能的蛋白质,它们位于内质网腔中,称为内质网驻留蛋白;(ii)正在向细胞外空间移动的蛋白质。因此,内质网驻留蛋白必须以某种方式与新合成的分泌蛋白区分开来,后者在离开细胞的途中穿过内质网。在本研究中用作训练数据的蛋白质中,大约只有50%具有内质网保留信号,这表明这些信号并非所有内质网驻留蛋白都必需具备。这也强烈表明了其他因素在内质网特异性蛋白保留在内质网中的作用。

方法

这是一种基于支持向量机的方法,我们使用了不同形式的蛋白质特征作为支持向量机的输入来开发预测模型。在训练过程中采用了交叉验证方法。通过蛋白质不同部分的氨基酸组成相结合获得了最佳性能。

结果

在本研究中,我们报告了一种基于支持向量机的预测内质网驻留蛋白的新方法,名为ERPred。在训练过程中,通过交叉验证方法我们获得了81.42%的最高准确率。在独立数据集上进行评估时,ERPred的预测灵敏度为72.31%,特异性为83.69%。我们还注释了六个不同的蛋白质组以预测其中的候选内质网驻留蛋白。开发了一个网络服务器ERPred,以使科学界能够使用该方法,可通过http://proteininformatics.org/mkumar/erpred/index.html访问。

讨论

我们发现训练数据集中的124种蛋白质中,只有66种蛋白质具有内质网保留信号,这表明这些信号对于内质网驻留蛋白保留在内质网中并非绝对必要。这一观察结果也强烈表明了其他因素在蛋白质保留在内质网中的作用。我们提出的预测器ERPred是一种不依赖信号的工具。它经过调整用于预测内质网驻留蛋白,即使查询蛋白不包含特定的内质网保留信号。

相似文献

1
Prediction of endoplasmic reticulum resident proteins using fragmented amino acid composition and support vector machine.利用片段氨基酸组成和支持向量机预测内质网驻留蛋白
PeerJ. 2017 Sep 4;5:e3561. doi: 10.7717/peerj.3561. eCollection 2017.
2
Predicting Endoplasmic Reticulum Resident Proteins Using Auto-Cross Covariance Transformation With a U-Shaped Residue Weight-Transfer Function.使用具有U形残基权重转移函数的自交叉协方差变换预测内质网驻留蛋白
Front Genet. 2019 Dec 20;10:1231. doi: 10.3389/fgene.2019.01231. eCollection 2019.
3
BlaPred: Predicting and classifying β-lactamase using a 3-tier prediction system via Chou's general PseAAC.BlaPred:通过 Chou 的通用 PseAAC 构建 3 级预测系统,预测和分类β-内酰胺酶。
J Theor Biol. 2018 Nov 14;457:29-36. doi: 10.1016/j.jtbi.2018.08.030. Epub 2018 Aug 20.
4
mRNALoc: a novel machine-learning based in-silico tool to predict mRNA subcellular localization.mRNA 定位:一种新的基于机器学习的计算工具,用于预测 mRNA 亚细胞定位。
Nucleic Acids Res. 2020 Jul 2;48(W1):W239-W243. doi: 10.1093/nar/gkaa385.
5
The retention signal for soluble proteins of the endoplasmic reticulum.内质网可溶性蛋白质的滞留信号。
Trends Biochem Sci. 1990 Dec;15(12):483-6. doi: 10.1016/0968-0004(90)90303-s.
6
PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine.PVP-SVM:使用支持向量机基于序列预测噬菌体病毒粒子蛋白
Front Microbiol. 2018 Mar 16;9:476. doi: 10.3389/fmicb.2018.00476. eCollection 2018.
7
Proteome-wide prediction and annotation of mitochondrial and sub-mitochondrial proteins by incorporating domain information.通过整合结构域信息,对线粒体和亚线粒体蛋白质进行蛋白质组范围的预测和注释。
Mitochondrion. 2018 Sep;42:11-22. doi: 10.1016/j.mito.2017.10.004. Epub 2017 Oct 12.
8
Prediction of zinc binding sites in proteins using sequence derived information.利用序列衍生信息预测蛋白质中的锌结合位点。
J Biomol Struct Dyn. 2018 Dec;36(16):4413-4423. doi: 10.1080/07391102.2017.1417910. Epub 2018 Jan 15.
9
Systems biology of the endoplasmic reticulum stress response.内质网应激反应的系统生物学
Subcell Biochem. 2007;43:277-98. doi: 10.1007/978-1-4020-5943-8_13.
10
Computational Modeling of C-Terminal Tails to Predict the Calcium-Dependent Secretion of Endoplasmic Reticulum Resident Proteins.用于预测内质网驻留蛋白钙依赖性分泌的C末端尾巴的计算模型
Front Chem. 2021 Jun 29;9:689608. doi: 10.3389/fchem.2021.689608. eCollection 2021.

引用本文的文献

1
Identifying the DNA methylation preference of transcription factors using ProtBERT and SVM.使用ProtBERT和支持向量机识别转录因子的DNA甲基化偏好性。
PLoS Comput Biol. 2025 May 13;21(5):e1012513. doi: 10.1371/journal.pcbi.1012513. eCollection 2025 May.
2
Leaf transcriptomic responses to arbuscular mycorrhizal symbioses exerting growth depressions in tomato.叶片转录组对丛枝菌根共生的响应致使番茄生长受阻
Arch Microbiol. 2025 May 8;207(6):139. doi: 10.1007/s00203-025-04343-x.
3
Structural basis of lipid transfer by a bridge-like lipid-transfer protein.

本文引用的文献

1
Human Protein Subcellular Localization with Integrated Source and Multi-label Ensemble Classifier.基于综合数据源和多标签集成分类器的人类蛋白质亚细胞定位
Sci Rep. 2016 Jun 21;6:28087. doi: 10.1038/srep28087.
2
PredHSP: Sequence Based Proteome-Wide Heat Shock Protein Prediction and Classification Tool to Unlock the Stress Biology.PredHSP:基于序列的全蛋白质组热休克蛋白预测与分类工具,用于揭示应激生物学
PLoS One. 2016 May 19;11(5):e0155872. doi: 10.1371/journal.pone.0155872. eCollection 2016.
3
Predicting Golgi-resident protein types using pseudo amino acid compositions: Approaches with positional specific physicochemical properties.
一种桥状脂质转运蛋白进行脂质转运的结构基础
Nature. 2025 Apr 23. doi: 10.1038/s41586-025-08918-y.
4
In planta ectopic expression of two subtypes of tomato cellulose synthase-like M genes affects cell wall integrity and supports a role in arabinogalactan and/or rhamnogalacturonan-I biosynthesis.番茄纤维素合酶样M基因的两个亚型在植物中的异位表达影响细胞壁完整性,并支持其在阿拉伯半乳聚糖和/或鼠李糖半乳糖醛酸聚糖-I生物合成中的作用。
Plant Cell Physiol. 2025 Jan 29;66(1):101-119. doi: 10.1093/pcp/pcae145.
5
Recent Advancements in Subcellular Proteomics: Growing Impact of Organellar Protein Niches on the Understanding of Cell Biology.亚细胞蛋白质组学的最新进展:细胞器蛋白质龛对细胞生物学理解的影响日益增大。
J Proteome Res. 2024 Aug 2;23(8):2700-2722. doi: 10.1021/acs.jproteome.3c00839. Epub 2024 Mar 7.
6
Over-expression of Anterior Gradient 3 Is Associated With Tumor Progression and Poor Survival in Gastric Cancer.前梯度蛋白 3 的过表达与胃癌的肿瘤进展和不良预后相关。
In Vivo. 2023 Jan-Feb;37(1):483-489. doi: 10.21873/invivo.13103.
7
Functions and mechanisms of protein disulfide isomerase family in cancer emergence.蛋白质二硫键异构酶家族在癌症发生中的功能及机制
Cell Biosci. 2022 Aug 14;12(1):129. doi: 10.1186/s13578-022-00868-6.
8
Aortic Dissection Auxiliary Diagnosis Model and Applied Research Based on Ensemble Learning.基于集成学习的主动脉夹层辅助诊断模型及应用研究
Front Cardiovasc Med. 2021 Dec 23;8:777757. doi: 10.3389/fcvm.2021.777757. eCollection 2021.
9
Computational methods for protein localization prediction.蛋白质定位预测的计算方法。
Comput Struct Biotechnol J. 2021 Oct 19;19:5834-5844. doi: 10.1016/j.csbj.2021.10.023. eCollection 2021.
10
Ensemble of Multiple Classifiers for Multilabel Classification of Plant Protein Subcellular Localization.用于植物蛋白质亚细胞定位多标签分类的多个分类器集成
Life (Basel). 2021 Mar 30;11(4):293. doi: 10.3390/life11040293.
使用伪氨基酸组成预测高尔基体驻留蛋白类型:具有位置特异性物理化学性质的方法。
J Theor Biol. 2016 Feb 21;391:35-42. doi: 10.1016/j.jtbi.2015.11.009. Epub 2015 Dec 15.
4
PaPI: pseudo amino acid composition to score human protein-coding variants.PaPI:用于评估人类蛋白质编码变体的伪氨基酸组成。
BMC Bioinformatics. 2015 Apr 19;16:123. doi: 10.1186/s12859-015-0554-8.
5
Prediction of β-lactamase and its class by Chou's pseudo-amino acid composition and support vector machine.基于周式伪氨基酸组成和支持向量机预测β-内酰胺酶及其类别
J Theor Biol. 2015 Jan 21;365:96-103. doi: 10.1016/j.jtbi.2014.10.008. Epub 2014 Oct 22.
6
NRfamPred: a proteome-scale two level method for prediction of nuclear receptor proteins and their sub-families.NRfamPred:一种用于预测核受体蛋白及其亚家族的蛋白质组规模的两级方法。
Sci Rep. 2014 Oct 29;4:6810. doi: 10.1038/srep06810.
7
iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.iDNA-Prot|dis:通过将氨基酸距离对和简化字母表概况纳入通用伪氨基酸组成来鉴定DNA结合蛋白。
PLoS One. 2014 Sep 3;9(9):e106691. doi: 10.1371/journal.pone.0106691. eCollection 2014.
8
Synthesis, Processing, and Function of N-glycans in N-glycoproteins.N-糖蛋白中N-聚糖的合成、加工与功能
Adv Neurobiol. 2014;9:47-70. doi: 10.1007/978-1-4939-1154-7_3.
9
Protein sub-nuclear localization prediction using SVM and Pfam domain information.利用支持向量机和Pfam结构域信息进行蛋白质亚核定位预测。
PLoS One. 2014 Jun 4;9(6):e98345. doi: 10.1371/journal.pone.0098345. eCollection 2014.
10
Retention mechanisms for ER and Golgi membrane proteins.内质网和高尔基体膜蛋白的滞留机制。
Trends Plant Sci. 2014 Aug;19(8):508-15. doi: 10.1016/j.tplants.2014.04.004. Epub 2014 Apr 30.