• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

对真核生物和原核生物中非经典分泌蛋白进行预测的基因本体论术语排序。

Ranking Gene Ontology terms for predicting non-classical secretory proteins in eukaryotes and prokaryotes.

机构信息

Department of Management Information System, Asia Pacific Institute of Creativity, No. 110 XueFu Rd., Tou Fen, Miaoli, Taiwan, ROC.

出版信息

J Theor Biol. 2012 Nov 7;312:105-13. doi: 10.1016/j.jtbi.2012.07.027. Epub 2012 Aug 8.

DOI:10.1016/j.jtbi.2012.07.027
PMID:22967952
Abstract

Protein secretion is an important biological process for both eukaryotes and prokaryotes. Several sequence-based methods mainly rely on utilizing various types of complementary features to design accurate classifiers for predicting non-classical secretory proteins. Gene Ontology (GO) terms are increasing informative in predicting protein functions. However, the number of used GO terms is often very large. For example, there are 60,020 GO terms used in the prediction method Euk-mPLoc 2.0 for subcellular localization. This study proposes a novel approach to identify a small set of m top-ranked GO terms served as the only type of input features to design a support vector machine (SVM) based method Sec-GO to predict non-classical secretory proteins in both eukaryotes and prokaryotes. To evaluate the Sec-GO method, two existing methods and their used datasets are adopted for performance comparisons. The Sec-GO method using m=436 GO terms yields an independent test accuracy of 96.7% on mammalian proteins, much better than the existing method SPRED (82.2%) which uses frequencies of tri-peptides and short peptides, secondary structure, and physicochemical properties as input features of a random forest classifier. Furthermore, when applying to Gram-positive bacterial proteins, the Sec-GO with m=158 GO terms has a test accuracy of 94.5%, superior to NClassG+ (90.0%) which uses SVM with several feature types, comprising amino acid composition, di-peptides, physicochemical properties and the position specific weighting matrix. Analysis of the distribution of secretory proteins in a GO database indicates the percentage of the non-classical secretory proteins annotated by GO is larger than that of classical secretory proteins in both eukaryotes and prokaryotes. Of the m top-ranked GO features, the top-four GO terms are all annotated by such subcellular locations as GO:0005576 (Extracellular region). Additionally, the method Sec-GO is easily implemented and its web tool of prediction is available at iclab.life.nctu.edu.tw/secgo.

摘要

蛋白质分泌是真核生物和原核生物的重要生物过程。几种基于序列的方法主要依赖于利用各种类型的互补特征来设计准确的分类器,以预测非经典分泌蛋白。基因本体论(GO)术语在预测蛋白质功能方面越来越有信息量。然而,使用的 GO 术语数量通常非常大。例如,亚细胞定位预测方法 Euk-mPLoc 2.0 使用了 60,020 个 GO 术语。本研究提出了一种新方法,该方法使用一组数量很少的排名最高的 m 个 GO 术语作为唯一类型的输入特征,设计基于支持向量机(SVM)的方法 Sec-GO,以预测真核生物和原核生物中的非经典分泌蛋白。为了评估 Sec-GO 方法,采用了两种现有的方法及其使用的数据集进行性能比较。Sec-GO 方法使用 m=436 个 GO 术语,在哺乳动物蛋白的独立测试中准确率达到 96.7%,明显优于使用三肽和短肽频率、二级结构和理化性质作为随机森林分类器输入特征的现有方法 SPRED(82.2%)。此外,当应用于革兰氏阳性细菌蛋白时,使用 m=158 个 GO 术语的 Sec-GO 具有 94.5%的测试准确率,优于使用包含氨基酸组成、二肽、理化性质和位置特异性加权矩阵等几种特征类型的 SVM 的 NClassG+(90.0%)。对 GO 数据库中分泌蛋白分布的分析表明,在真核生物和原核生物中,GO 注释的非经典分泌蛋白的百分比大于经典分泌蛋白。在 m 个排名最高的 GO 特征中,排名前四的 GO 术语都被 GO:0005576(细胞外区域)等亚细胞位置注释。此外,Sec-GO 方法易于实现,其预测的网络工具可在 iclab.life.nctu.edu.tw/secgo 上获得。

相似文献

1
Ranking Gene Ontology terms for predicting non-classical secretory proteins in eukaryotes and prokaryotes.对真核生物和原核生物中非经典分泌蛋白进行预测的基因本体论术语排序。
J Theor Biol. 2012 Nov 7;312:105-13. doi: 10.1016/j.jtbi.2012.07.027. Epub 2012 Aug 8.
2
Predicting protein subnuclear localization using GO-amino-acid composition features.利用基因本体论-氨基酸组成特征预测蛋白质亚核定位
Biosystems. 2009 Nov;98(2):73-9. doi: 10.1016/j.biosystems.2009.06.007. Epub 2009 Jul 5.
3
ProLoc-GO: utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization.ProLoc-GO:利用信息丰富的基因本体术语进行基于序列的蛋白质亚细胞定位预测。
BMC Bioinformatics. 2008 Feb 1;9:80. doi: 10.1186/1471-2105-9-80.
4
Prediction of protein subcellular localization.蛋白质亚细胞定位预测
Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018.
5
GOASVM: a subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou's pseudo-amino acid composition.GOASVM:通过将词频基因本体论纳入 Chou 的通用伪氨基酸组成形式来预测亚细胞位置。
J Theor Biol. 2013 Apr 21;323:40-8. doi: 10.1016/j.jtbi.2013.01.012. Epub 2013 Jan 29.
6
Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites.Euk-mPLoc:一种通过整合多个位点进行大规模真核生物蛋白质亚细胞定位预测的融合分类器。
J Proteome Res. 2007 May;6(5):1728-34. doi: 10.1021/pr060635i. Epub 2007 Mar 31.
7
A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.一种基于机器学习的方法,利用氨基酸组成、顺序和相似性搜索来预测分泌蛋白。
In Silico Biol. 2008;8(2):129-40.
8
Prediction of protein subcellular location using a combined feature of sequence.利用序列的组合特征预测蛋白质亚细胞定位。
FEBS Lett. 2005 Jun 20;579(16):3444-8. doi: 10.1016/j.febslet.2005.05.021.
9
Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization.Hum-PLoc:一种用于预测人类蛋白质亚细胞定位的新型集成分类器。
Biochem Biophys Res Commun. 2006 Aug 18;347(1):150-7. doi: 10.1016/j.bbrc.2006.06.059. Epub 2006 Jun 21.
10
SecretP: a new method for predicting mammalian secreted proteins.SecretP:一种新的哺乳动物分泌蛋白预测方法。
Peptides. 2010 Apr;31(4):574-8. doi: 10.1016/j.peptides.2009.12.026. Epub 2010 Jan 4.

引用本文的文献

1
Transcriptomic Profiling Reveals Altered Expression of Genes Involved in Metabolic and Immune Processes in NDV-Infected Chicken Embryos.转录组分析揭示了新城疫病毒感染鸡胚中参与代谢和免疫过程的基因表达变化。
Metabolites. 2024 Dec 2;14(12):669. doi: 10.3390/metabo14120669.
2
Evolution, role in inflammation, and redox control of leaderless secretory proteins.无领导分泌蛋白的演化、在炎症中的作用和氧化还原调控。
J Biol Chem. 2020 May 29;295(22):7799-7811. doi: 10.1074/jbc.REV119.008907. Epub 2020 Apr 24.
3
Designing novel construction for cell surface display of protein E on Escherichia coli using non-classical pathway based on Lpp-OmpA.
基于Lpp-OmpA的非经典途径设计用于在大肠杆菌上进行蛋白质E细胞表面展示的新型构建体。
AMB Express. 2017 Dec;7(1):53. doi: 10.1186/s13568-017-0350-0. Epub 2017 Feb 28.
4
Better Than Nothing? Limitations of the Prediction Tool SecretomeP in the Search for Leaderless Secretory Proteins (LSPs) in Plants.聊胜于无?植物中预测工具SecretomeP在寻找无信号肽分泌蛋白(LSPs)方面的局限性
Front Plant Sci. 2016 Sep 27;7:1451. doi: 10.3389/fpls.2016.01451. eCollection 2016.
5
Elucidation of the CHO Super-Ome (CHO-SO) by Proteoinformatics.通过蛋白质信息学阐明中国仓鼠卵巢细胞超级组(CHO-SO)
J Proteome Res. 2015 Nov 6;14(11):4687-703. doi: 10.1021/acs.jproteome.5b00588. Epub 2015 Oct 13.
6
Rule-based knowledge acquisition method for promoter prediction in human and Drosophila species.基于规则的人类和果蝇物种启动子预测知识获取方法。
ScientificWorldJournal. 2014;2014:327306. doi: 10.1155/2014/327306. Epub 2014 Jan 29.
7
Chromatinized protein kinase C-θ directly regulates inducible genes in epithelial to mesenchymal transition and breast cancer stem cells.染色质蛋白激酶 C-θ 直接调节上皮间质转化和乳腺癌干细胞中的诱导基因。
Mol Cell Biol. 2014 Aug;34(16):2961-80. doi: 10.1128/MCB.01693-13. Epub 2014 Jun 2.