• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

R3P-Loc:一种使用岭回归和随机投影进行蛋白质亚细胞定位的紧凑型多标签预测器。

R3P-Loc: a compact multi-label predictor using ridge regression and random projection for protein subcellular localization.

作者信息

Wan Shibiao, Mak Man-Wai, Kung Sun-Yuan

机构信息

Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong SAR, China.

Department of Electrical Engineering, Princeton University, NJ, USA.

出版信息

J Theor Biol. 2014 Nov 7;360:34-45. doi: 10.1016/j.jtbi.2014.06.031. Epub 2014 Jul 2.

DOI:10.1016/j.jtbi.2014.06.031
PMID:24997236
Abstract

Locating proteins within cellular contexts is of paramount significance in elucidating their biological functions. Computational methods based on knowledge databases (such as gene ontology annotation (GOA) database) are known to be more efficient than sequence-based methods. However, the predominant scenarios of knowledge-based methods are that (1) knowledge databases typically have enormous size and are growing exponentially, (2) knowledge databases contain redundant information, and (3) the number of extracted features from knowledge databases is much larger than the number of data samples with ground-truth labels. These properties render the extracted features liable to redundant or irrelevant information, causing the prediction systems suffer from overfitting. To address these problems, this paper proposes an efficient multi-label predictor, namely R3P-Loc, which uses two compact databases for feature extraction and applies random projection (RP) to reduce the feature dimensions of an ensemble ridge regression (RR) classifier. Two new compact databases are created from Swiss-Prot and GOA databases. These databases possess almost the same amount of information as their full-size counterparts but with much smaller size. Experimental results on two recent datasets (eukaryote and plant) suggest that R3P-Loc can reduce the dimensions by seven-folds and significantly outperforms state-of-the-art predictors. This paper also demonstrates that the compact databases reduce the memory consumption by 39 times without causing degradation in prediction accuracy. For readers׳ convenience, the R3P-Loc server is available online at url:http://bioinfo.eie.polyu.edu.hk/R3PLocServer/.

摘要

在细胞环境中定位蛋白质对于阐明其生物学功能至关重要。已知基于知识数据库(如基因本体注释(GOA)数据库)的计算方法比基于序列的方法更有效。然而,基于知识的方法的主要情况是:(1)知识数据库通常规模巨大且呈指数增长;(2)知识数据库包含冗余信息;(3)从知识数据库中提取的特征数量远大于带有真实标签的数据样本数量。这些特性使得提取的特征容易包含冗余或不相关信息,导致预测系统出现过拟合。为了解决这些问题,本文提出了一种高效的多标签预测器,即R3P-Loc,它使用两个紧凑数据库进行特征提取,并应用随机投影(RP)来降低集成岭回归(RR)分类器的特征维度。从Swiss-Prot和GOA数据库创建了两个新的紧凑数据库。这些数据库拥有与其全尺寸对应数据库几乎相同数量的信息,但规模要小得多。在最近的两个数据集(真核生物和植物)上的实验结果表明,R3P-Loc可以将维度降低七倍,并且显著优于现有最先进的预测器。本文还表明,紧凑数据库将内存消耗降低了39倍,而不会导致预测准确性下降。为方便读者,R3P-Loc服务器可在线访问网址:http://bioinfo.eie.polyu.edu.hk/R3PLocServer/ 。

相似文献

1
R3P-Loc: a compact multi-label predictor using ridge regression and random projection for protein subcellular localization.R3P-Loc:一种使用岭回归和随机投影进行蛋白质亚细胞定位的紧凑型多标签预测器。
J Theor Biol. 2014 Nov 7;360:34-45. doi: 10.1016/j.jtbi.2014.06.031. Epub 2014 Jul 2.
2
mPLR-Loc: an adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction.mPLR-Loc:一种基于惩罚逻辑回归的自适应决策多标签分类器,用于蛋白质亚细胞定位预测。
Anal Biochem. 2015 Mar 15;473:14-27. doi: 10.1016/j.ab.2014.10.014. Epub 2014 Oct 31.
3
mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines.mGOASVM:基于基因本体和支持向量机的多标签蛋白质亚细胞定位。
BMC Bioinformatics. 2012 Nov 6;13:290. doi: 10.1186/1471-2105-13-290.
4
HybridGO-Loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins.HybridGO-Loc:在基因本体论上挖掘混合特征以预测多定位蛋白质的亚细胞定位。
PLoS One. 2014 Mar 19;9(3):e89545. doi: 10.1371/journal.pone.0089545. eCollection 2014.
5
mLASSO-Hum: A LASSO-based interpretable human-protein subcellular localization predictor.mLASSO-Hum:一种基于套索算法的可解释的人类蛋白质亚细胞定位预测器。
J Theor Biol. 2015 Oct 7;382:223-34. doi: 10.1016/j.jtbi.2015.06.042. Epub 2015 Jul 9.
6
Mem-ADSVM: A two-layer multi-label predictor for identifying multi-functional types of membrane proteins.Mem-ADSVM:一种用于识别多功能膜蛋白类型的双层多标签预测器。
J Theor Biol. 2016 Jun 7;398:32-42. doi: 10.1016/j.jtbi.2016.03.013. Epub 2016 Mar 19.
7
Ensemble Linear Neighborhood Propagation for Predicting Subchloroplast Localization of Multi-Location Proteins.用于预测多定位蛋白质亚叶绿体定位的集成线性邻域传播算法
J Proteome Res. 2016 Dec 2;15(12):4755-4762. doi: 10.1021/acs.jproteome.6b00686. Epub 2016 Nov 3.
8
Transductive Learning for Multi-Label Protein Subchloroplast Localization Prediction.用于多标签蛋白质亚叶绿体定位预测的转导学习
IEEE/ACM Trans Comput Biol Bioinform. 2017 Jan-Feb;14(1):212-224. doi: 10.1109/TCBB.2016.2527657. Epub 2016 Feb 8.
9
Sparse regressions for predicting and interpreting subcellular localization of multi-label proteins.用于预测和解释多标签蛋白质亚细胞定位的稀疏回归
BMC Bioinformatics. 2016 Feb 24;17:97. doi: 10.1186/s12859-016-0940-x.
10
Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble.利用基因本体论和多标签分类器集成进行多地点革兰氏阳性和革兰氏阴性细菌蛋白质亚细胞定位
BMC Bioinformatics. 2015;16 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-16-S12-S1. Epub 2015 Aug 25.

引用本文的文献

1
Integrating transcriptomics and hybrid machine learning enables high-accuracy diagnostic modeling for nasopharyngeal carcinoma.整合转录组学和混合机器学习可实现鼻咽癌的高精度诊断建模。
Discov Oncol. 2025 Jun 12;16(1):1067. doi: 10.1007/s12672-025-02932-2.
2
A Comprehensive Review on RNA Subcellular Localization Prediction.RNA亚细胞定位预测综述
ArXiv. 2025 Apr 24:arXiv:2504.17162v1.
3
Protein subcellular localization prediction tools.蛋白质亚细胞定位预测工具。
Comput Struct Biotechnol J. 2024 Apr 15;23:1796-1807. doi: 10.1016/j.csbj.2024.04.032. eCollection 2024 Dec.
4
Protein sequence information extraction and subcellular localization prediction with gapped k-Mer method.使用缺口 k-Mer 方法进行蛋白质序列信息提取和亚细胞定位预测。
BMC Bioinformatics. 2019 Dec 30;20(Suppl 22):719. doi: 10.1186/s12859-019-3232-4.
5
Identification of self-interacting proteins by integrating random projection classifier and finite impulse response filter.通过整合随机投影分类器和有限脉冲响应滤波器来鉴定自我相互作用的蛋白质。
BMC Genomics. 2019 Dec 27;20(Suppl 13):928. doi: 10.1186/s12864-019-6301-1.
6
Comparison and development of machine learning tools in the prediction of chronic kidney disease progression.机器学习工具在慢性肾脏病进展预测中的比较与发展。
J Transl Med. 2019 Apr 11;17(1):119. doi: 10.1186/s12967-019-1860-0.
7
Using Baidu index to nowcast hand-foot-mouth disease in China: a meta learning approach.利用百度指数对中国手足口病进行实时预测:一种元学习方法。
BMC Infect Dis. 2018 Aug 13;18(1):398. doi: 10.1186/s12879-018-3285-4.
8
Computational Approaches to Prioritize Cancer Driver Missense Mutations.计算方法在优先考虑癌症驱动点突变中的应用。
Int J Mol Sci. 2018 Jul 20;19(7):2113. doi: 10.3390/ijms19072113.
9
Prediction of subcellular location of apoptosis proteins by incorporating PsePSSM and DCCA coefficient based on LFDA dimensionality reduction.基于 LFDA 降维的 PsePSSM 和 DCCA 系数融合预测细胞凋亡蛋白的亚细胞定位。
BMC Genomics. 2018 Jun 19;19(1):478. doi: 10.1186/s12864-018-4849-9.
10
Benchmark data for identifying multi-functional types of membrane proteins.用于识别多功能膜蛋白类型的基准数据。
Data Brief. 2016 May 21;8:105-7. doi: 10.1016/j.dib.2016.05.024. eCollection 2016 Sep.