• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于预测和解释多标签蛋白质亚细胞定位的稀疏回归

Sparse regressions for predicting and interpreting subcellular localization of multi-label proteins.

作者信息

Wan Shibiao, Mak Man-Wai, Kung Sun-Yuan

机构信息

Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong, SAR, China.

Department of Electrical Engineering, Princeton University, New Jersey, USA.

出版信息

BMC Bioinformatics. 2016 Feb 24;17:97. doi: 10.1186/s12859-016-0940-x.

DOI:10.1186/s12859-016-0940-x
PMID:26911432
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4765148/
Abstract

BACKGROUND

Predicting protein subcellular localization is indispensable for inferring protein functions. Recent studies have been focusing on predicting not only single-location proteins, but also multi-location proteins. Almost all of the high performing predictors proposed recently use gene ontology (GO) terms to construct feature vectors for classification. Despite their high performance, their prediction decisions are difficult to interpret because of the large number of GO terms involved.

RESULTS

This paper proposes using sparse regressions to exploit GO information for both predicting and interpreting subcellular localization of single- and multi-location proteins. Specifically, we compared two multi-label sparse regression algorithms, namely multi-label LASSO (mLASSO) and multi-label elastic net (mEN), for large-scale predictions of protein subcellular localization. Both algorithms can yield sparse and interpretable solutions. By using the one-vs-rest strategy, mLASSO and mEN identified 87 and 429 out of more than 8,000 GO terms, respectively, which play essential roles in determining subcellular localization. More interestingly, many of the GO terms selected by mEN are from the biological process and molecular function categories, suggesting that the GO terms of these categories also play vital roles in the prediction. With these essential GO terms, not only where a protein locates can be decided, but also why it resides there can be revealed.

CONCLUSIONS

Experimental results show that the output of both mEN and mLASSO are interpretable and they perform significantly better than existing state-of-the-art predictors. Moreover, mEN selects more features and performs better than mLASSO on a stringent human benchmark dataset. For readers' convenience, an online server called SpaPredictor for both mLASSO and mEN is available at http://bioinfo.eie.polyu.edu.hk/SpaPredictorServer/.

摘要

背景

预测蛋白质亚细胞定位对于推断蛋白质功能至关重要。最近的研究不仅专注于预测单定位蛋白质,还包括多定位蛋白质。几乎所有最近提出的高性能预测器都使用基因本体(GO)术语来构建用于分类的特征向量。尽管它们性能很高,但由于涉及大量的GO术语,其预测决策难以解释。

结果

本文提出使用稀疏回归来利用GO信息预测和解释单定位和多定位蛋白质的亚细胞定位。具体而言,我们比较了两种多标签稀疏回归算法,即多标签套索(mLASSO)和多标签弹性网(mEN),用于蛋白质亚细胞定位的大规模预测。这两种算法都能产生稀疏且可解释的解决方案。通过使用一对其余策略,mLASSO和mEN分别从8000多个GO术语中识别出87个和429个,这些术语在确定亚细胞定位中起着至关重要的作用。更有趣的是,mEN选择的许多GO术语来自生物过程和分子功能类别,这表明这些类别的GO术语在预测中也起着至关重要的作用。有了这些重要的GO术语,不仅可以确定蛋白质的定位,还可以揭示其定位的原因。

结论

实验结果表明,mEN和mLASSO的输出都是可解释的,并且它们的性能明显优于现有的最先进预测器。此外,在严格的人类基准数据集上,mEN选择了更多特征并且比mLASSO表现更好。为方便读者,可通过http://bioinfo.eie.polyu.edu.hk/SpaPredictorServer/获得一个名为SpaPredictor的在线服务器,用于mLASSO和mEN。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/339f/4765148/cbf60242f936/12859_2016_940_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/339f/4765148/4e8aebacdf8e/12859_2016_940_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/339f/4765148/ed30e1673170/12859_2016_940_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/339f/4765148/cbc01000bf33/12859_2016_940_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/339f/4765148/228829fa9fea/12859_2016_940_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/339f/4765148/1e079f953f8c/12859_2016_940_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/339f/4765148/167aafa8200f/12859_2016_940_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/339f/4765148/cbf60242f936/12859_2016_940_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/339f/4765148/4e8aebacdf8e/12859_2016_940_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/339f/4765148/ed30e1673170/12859_2016_940_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/339f/4765148/cbc01000bf33/12859_2016_940_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/339f/4765148/228829fa9fea/12859_2016_940_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/339f/4765148/1e079f953f8c/12859_2016_940_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/339f/4765148/167aafa8200f/12859_2016_940_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/339f/4765148/cbf60242f936/12859_2016_940_Fig7_HTML.jpg

相似文献

1
Sparse regressions for predicting and interpreting subcellular localization of multi-label proteins.用于预测和解释多标签蛋白质亚细胞定位的稀疏回归
BMC Bioinformatics. 2016 Feb 24;17:97. doi: 10.1186/s12859-016-0940-x.
2
mLASSO-Hum: A LASSO-based interpretable human-protein subcellular localization predictor.mLASSO-Hum:一种基于套索算法的可解释的人类蛋白质亚细胞定位预测器。
J Theor Biol. 2015 Oct 7;382:223-34. doi: 10.1016/j.jtbi.2015.06.042. Epub 2015 Jul 9.
3
mPLR-Loc: an adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction.mPLR-Loc:一种基于惩罚逻辑回归的自适应决策多标签分类器,用于蛋白质亚细胞定位预测。
Anal Biochem. 2015 Mar 15;473:14-27. doi: 10.1016/j.ab.2014.10.014. Epub 2014 Oct 31.
4
Mem-mEN: Predicting Multi-Functional Types of Membrane Proteins by Interpretable Elastic Nets.Mem-mEN:通过可解释弹性网络预测膜蛋白的多功能类型
IEEE/ACM Trans Comput Biol Bioinform. 2016 Jul-Aug;13(4):706-18. doi: 10.1109/TCBB.2015.2474407. Epub 2015 Aug 28.
5
HybridGO-Loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins.HybridGO-Loc:在基因本体论上挖掘混合特征以预测多定位蛋白质的亚细胞定位。
PLoS One. 2014 Mar 19;9(3):e89545. doi: 10.1371/journal.pone.0089545. eCollection 2014.
6
mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines.mGOASVM:基于基因本体和支持向量机的多标签蛋白质亚细胞定位。
BMC Bioinformatics. 2012 Nov 6;13:290. doi: 10.1186/1471-2105-13-290.
7
Ensemble Linear Neighborhood Propagation for Predicting Subchloroplast Localization of Multi-Location Proteins.用于预测多定位蛋白质亚叶绿体定位的集成线性邻域传播算法
J Proteome Res. 2016 Dec 2;15(12):4755-4762. doi: 10.1021/acs.jproteome.6b00686. Epub 2016 Nov 3.
8
Transductive Learning for Multi-Label Protein Subchloroplast Localization Prediction.用于多标签蛋白质亚叶绿体定位预测的转导学习
IEEE/ACM Trans Comput Biol Bioinform. 2017 Jan-Feb;14(1):212-224. doi: 10.1109/TCBB.2016.2527657. Epub 2016 Feb 8.
9
FUEL-mLoc: feature-unified prediction and explanation of multi-localization of cellular proteins in multiple organisms.FUEL-mLoc:多物种细胞蛋白质多定位的特征统一预测与解释
Bioinformatics. 2017 Mar 1;33(5):749-750. doi: 10.1093/bioinformatics/btw717.
10
Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble.利用基因本体论和多标签分类器集成进行多地点革兰氏阳性和革兰氏阴性细菌蛋白质亚细胞定位
BMC Bioinformatics. 2015;16 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-16-S12-S1. Epub 2015 Aug 25.

引用本文的文献

1
A Comprehensive Review on RNA Subcellular Localization Prediction.RNA亚细胞定位预测综述
ArXiv. 2025 Apr 24:arXiv:2504.17162v1.
2
A Review for Artificial Intelligence Based Protein Subcellular Localization.基于人工智能的蛋白质亚细胞定位研究综述
Biomolecules. 2024 Mar 27;14(4):409. doi: 10.3390/biom14040409.
3
DeepLoc 2.0: multi-label subcellular localization prediction using protein language models.DeepLoc 2.0:使用蛋白质语言模型进行多标签亚细胞定位预测。

本文引用的文献

1
Mem-mEN: Predicting Multi-Functional Types of Membrane Proteins by Interpretable Elastic Nets.Mem-mEN:通过可解释弹性网络预测膜蛋白的多功能类型
IEEE/ACM Trans Comput Biol Bioinform. 2016 Jul-Aug;13(4):706-18. doi: 10.1109/TCBB.2015.2474407. Epub 2015 Aug 28.
2
mLASSO-Hum: A LASSO-based interpretable human-protein subcellular localization predictor.mLASSO-Hum:一种基于套索算法的可解释的人类蛋白质亚细胞定位预测器。
J Theor Biol. 2015 Oct 7;382:223-34. doi: 10.1016/j.jtbi.2015.06.042. Epub 2015 Jul 9.
3
Data-driven encoding for quantitative genetic trait prediction.
Nucleic Acids Res. 2022 Jul 5;50(W1):W228-W234. doi: 10.1093/nar/gkac278.
4
Improved cancer biomarkers identification using network-constrained infinite latent feature selection.使用网络约束无限潜在特征选择改进癌症生物标志物识别
PLoS One. 2021 Feb 11;16(2):e0246668. doi: 10.1371/journal.pone.0246668. eCollection 2021.
5
Learning important features from multi-view data to predict drug side effects.从多视图数据中学习重要特征以预测药物副作用。
J Cheminform. 2019 Dec 16;11(1):79. doi: 10.1186/s13321-019-0402-3.
6
Ternary Fingerprints with Reference Odor for Fluctuation-Enhanced Sensing.三元指印结合参考气味增强波动感应
Biosensors (Basel). 2020 Aug 9;10(8):93. doi: 10.3390/bios10080093.
7
Use of Chou's 5-steps rule to predict the subcellular localization of gram-negative and gram-positive bacterial proteins by multi-label learning based on gene ontology annotation and profile alignment.利用 Chou 的 5 步规则,通过基于基因本体论注释和序列比对的多标签学习,预测革兰氏阴性和革兰氏阳性细菌蛋白质的亚细胞定位。
J Integr Bioinform. 2020 Jun 29;18(1):51-79. doi: 10.1515/jib-2019-0091.
8
Identifying essential proteins in dynamic protein networks based on an improved h-index algorithm.基于改进的 h 指数算法鉴定动态蛋白质网络中的必需蛋白质。
BMC Med Inform Decis Mak. 2020 Jun 17;20(1):110. doi: 10.1186/s12911-020-01141-x.
9
MIC_Locator: a novel image-based protein subcellular location multi-label prediction model based on multi-scale monogenic signal representation and intensity encoding strategy.MIC_Locator:一种新颖的基于图像的蛋白质亚细胞位置多标签预测模型,基于多尺度单基因信号表示和强度编码策略。
BMC Bioinformatics. 2019 Oct 26;20(1):522. doi: 10.1186/s12859-019-3136-3.
10
A New Method for Recognizing Cytokines Based on Feature Combination and a Support Vector Machine Classifier.基于特征组合和支持向量机分类器的细胞因子识别新方法。
Molecules. 2018 Aug 11;23(8):2008. doi: 10.3390/molecules23082008.
基于数据驱动的定量遗传性状预测编码。
BMC Bioinformatics. 2015;16 Suppl 1(Suppl 1):S10. doi: 10.1186/1471-2105-16-S1-S10. Epub 2015 Feb 18.
4
Efficient and sparse feature selection for biomedical text classification via the elastic net: Application to ICU risk stratification from nursing notes.通过弹性网络进行生物医学文本分类的高效稀疏特征选择:在根据护理记录进行重症监护病房风险分层中的应用
J Biomed Inform. 2015 Apr;54:114-20. doi: 10.1016/j.jbi.2015.02.003. Epub 2015 Feb 17.
5
mPLR-Loc: an adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction.mPLR-Loc:一种基于惩罚逻辑回归的自适应决策多标签分类器,用于蛋白质亚细胞定位预测。
Anal Biochem. 2015 Mar 15;473:14-27. doi: 10.1016/j.ab.2014.10.014. Epub 2014 Oct 31.
6
R3P-Loc: a compact multi-label predictor using ridge regression and random projection for protein subcellular localization.R3P-Loc:一种使用岭回归和随机投影进行蛋白质亚细胞定位的紧凑型多标签预测器。
J Theor Biol. 2014 Nov 7;360:34-45. doi: 10.1016/j.jtbi.2014.06.031. Epub 2014 Jul 2.
7
HybridGO-Loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins.HybridGO-Loc:在基因本体论上挖掘混合特征以预测多定位蛋白质的亚细胞定位。
PLoS One. 2014 Mar 19;9(3):e89545. doi: 10.1371/journal.pone.0089545. eCollection 2014.
8
Application of multi-SNP approaches Bayesian LASSO and AUC-RF to detect main effects of inflammatory-gene variants associated with bladder cancer risk.应用多单核苷酸多态性方法(贝叶斯套索法和曲线下面积随机森林法)检测与膀胱癌风险相关的炎症基因变异的主要效应。
PLoS One. 2013 Dec 31;8(12):e83745. doi: 10.1371/journal.pone.0083745. eCollection 2013.
9
Information-theoretic evaluation of predicted ontological annotations.基于信息论的预测本体论注释评估。
Bioinformatics. 2013 Jul 1;29(13):i53-61. doi: 10.1093/bioinformatics/btt228.
10
Some remarks on predicting multi-label attributes in molecular biosystems.关于预测分子生物系统中多标签属性的一些评论。
Mol Biosyst. 2013 Jun;9(6):1092-100. doi: 10.1039/c3mb25555g. Epub 2013 Mar 28.