• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PLPD:从不平衡和重叠数据集中进行可靠的蛋白质定位预测。

PLPD: reliable protein localization prediction from imbalanced and overlapped datasets.

作者信息

Lee KiYoung, Kim Dae-Won, Na DoKyun, Lee Kwang H, Lee Doheon

机构信息

Department of BioSystems, KAIST, Daejeon City, Republic of Korea.

出版信息

Nucleic Acids Res. 2006;34(17):4655-66. doi: 10.1093/nar/gkl638. Epub 2006 Sep 11.

DOI:10.1093/nar/gkl638
PMID:16966337
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1636404/
Abstract

Subcellular localization is one of the key functional characteristics of proteins. An automatic and efficient prediction method for the protein subcellular localization is highly required owing to the need for large-scale genome analysis. From a machine learning point of view, a dataset of protein localization has several characteristics: the dataset has too many classes (there are more than 10 localizations in a cell), it is a multi-label dataset (a protein may occur in several different subcellular locations), and it is too imbalanced (the number of proteins in each localization is remarkably different). Even though many previous works have been done for the prediction of protein subcellular localization, none of them tackles effectively these characteristics at the same time. Thus, a new computational method for protein localization is eventually needed for more reliable outcomes. To address the issue, we present a protein localization predictor based on D-SVDD (PLPD) for the prediction of protein localization, which can find the likelihood of a specific localization of a protein more easily and more correctly. Moreover, we introduce three measurements for the more precise evaluation of a protein localization predictor. As the results of various datasets which are made from the experiments of Huh et al. (2003), the proposed PLPD method represents a different approach that might play a complimentary role to the existing methods, such as Nearest Neighbor method and discriminate covariant method. Finally, after finding a good boundary for each localization using the 5184 classified proteins as training data, we predicted 138 proteins whose subcellular localizations could not be clearly observed by the experiments of Huh et al. (2003).

摘要

亚细胞定位是蛋白质的关键功能特性之一。由于大规模基因组分析的需求,迫切需要一种自动且高效的蛋白质亚细胞定位预测方法。从机器学习的角度来看,蛋白质定位数据集具有几个特点:该数据集类别过多(细胞中有超过10种定位),是一个多标签数据集(一种蛋白质可能出现在几个不同的亚细胞位置),并且严重失衡(每个定位中蛋白质的数量差异显著)。尽管之前已经有许多关于蛋白质亚细胞定位预测的工作,但没有一项能同时有效解决这些特点。因此,最终需要一种新的蛋白质定位计算方法来获得更可靠的结果。为了解决这个问题,我们提出了一种基于D-SVDD的蛋白质定位预测器(PLPD)来预测蛋白质定位,它能够更轻松、更准确地找到蛋白质特定定位的可能性。此外,我们引入了三种测量方法来更精确地评估蛋白质定位预测器。作为基于Huh等人(2003年)实验所构建的各种数据集的结果,所提出的PLPD方法代表了一种不同的方法,可能对现有方法(如最近邻方法和判别协变方法)起到补充作用。最后,使用5184个分类蛋白质作为训练数据为每个定位找到一个良好的边界后,我们预测了138个蛋白质的亚细胞定位,这些定位在Huh等人(2003年)的实验中无法清晰观察到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc40/1636404/0b91cd6ef6e8/gkl638f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc40/1636404/9ea0b5dc7f57/gkl638f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc40/1636404/9ef20493e1ff/gkl638f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc40/1636404/0b91cd6ef6e8/gkl638f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc40/1636404/9ea0b5dc7f57/gkl638f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc40/1636404/9ef20493e1ff/gkl638f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc40/1636404/0b91cd6ef6e8/gkl638f3.jpg

相似文献

1
PLPD: reliable protein localization prediction from imbalanced and overlapped datasets.PLPD:从不平衡和重叠数据集中进行可靠的蛋白质定位预测。
Nucleic Acids Res. 2006;34(17):4655-66. doi: 10.1093/nar/gkl638. Epub 2006 Sep 11.
2
Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features.通过多视角特征的并行融合提高膜蛋白亚细胞定位预测。
IEEE Trans Nanobioscience. 2012 Dec;11(4):375-85. doi: 10.1109/TNB.2012.2208473. Epub 2012 Aug 3.
3
Multilabel learning for protein subcellular location prediction.多标签学习在蛋白质亚细胞定位预测中的应用。
IEEE Trans Nanobioscience. 2012 Sep;11(3):237-43. doi: 10.1109/TNB.2012.2212249.
4
Identifying the singleplex and multiplex proteins based on transductive learning for protein subcellular localization prediction.基于转导学习的单聚体和多聚体蛋白质鉴定及其在蛋白质亚细胞定位预测中的应用。
Biotechnol Lett. 2013 Jul;35(7):1107-13. doi: 10.1007/s10529-013-1186-6. Epub 2013 Apr 12.
5
Protein subcellular localization prediction using multiple kernel learning based support vector machine.基于多核学习支持向量机的蛋白质亚细胞定位预测
Mol Biosyst. 2017 Mar 28;13(4):785-795. doi: 10.1039/c6mb00860g.
6
Going from where to why--interpretable prediction of protein subcellular localization.从何处到为何——蛋白质亚细胞定位的可解释预测。
Bioinformatics. 2010 May 1;26(9):1232-8. doi: 10.1093/bioinformatics/btq115. Epub 2010 Mar 17.
7
SubCellProt: predicting protein subcellular localization using machine learning approaches.SubCellProt:使用机器学习方法预测蛋白质亚细胞定位。
In Silico Biol. 2009;9(1-2):35-44.
8
Using Nearest Feature Line and Tunable Nearest Neighbor methods for prediction of protein subcellular locations.使用最近特征线和可调最近邻方法预测蛋白质亚细胞定位。
Comput Biol Chem. 2005 Oct;29(5):388-92. doi: 10.1016/j.compbiolchem.2005.08.002. Epub 2005 Oct 5.
9
Prediction of protein subcellular localization.蛋白质亚细胞定位预测
Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018.
10
Esub8: a novel tool to predict protein subcellular localizations in eukaryotic organisms.Esub8:一种预测真核生物中蛋白质亚细胞定位的新型工具。
BMC Bioinformatics. 2004 May 27;5:66. doi: 10.1186/1471-2105-5-66.

引用本文的文献

1
Computational methods for protein localization prediction.蛋白质定位预测的计算方法。
Comput Struct Biotechnol J. 2021 Oct 19;19:5834-5844. doi: 10.1016/j.csbj.2021.10.023. eCollection 2021.
2
Integrating images from multiple microscopy screens reveals diverse patterns of change in the subcellular localization of proteins.整合来自多个显微镜屏幕的图像揭示了蛋白质亚细胞定位变化的多样化模式。
Elife. 2018 Apr 5;7:e31872. doi: 10.7554/eLife.31872.
3
Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble.

本文引用的文献

1
Molecular analysis reveals localization of Saccharomyces cerevisiae protein kinase C to sites of polarized growth and Pkc1p targeting to the nucleus and mitotic spindle.分子分析揭示酿酒酵母蛋白激酶C定位于极性生长位点以及Pkc1p靶向细胞核和有丝分裂纺锤体。
Eukaryot Cell. 2005 Jan;4(1):36-45. doi: 10.1128/EC.4.1.36-45.2005.
2
Predicting protein localization in budding yeast.预测芽殖酵母中的蛋白质定位。
Bioinformatics. 2005 Apr 1;21(7):944-50. doi: 10.1093/bioinformatics/bti104. Epub 2004 Oct 28.
3
SLLE for predicting membrane protein types.
利用基因本体论和多标签分类器集成进行多地点革兰氏阳性和革兰氏阴性细菌蛋白质亚细胞定位
BMC Bioinformatics. 2015;16 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-16-S12-S1. Epub 2015 Aug 25.
4
Proteome-wide remodeling of protein location and function by stress.应激对蛋白质定位和功能的全蛋白质组重塑
Proc Natl Acad Sci U S A. 2014 Jul 29;111(30):E3157-66. doi: 10.1073/pnas.1318881111. Epub 2014 Jul 15.
5
Proteome-wide discovery of mislocated proteins in cancer.癌症中定位错误蛋白质的蛋白质组全面发现。
Genome Res. 2013 Aug;23(8):1283-94. doi: 10.1101/gr.155499.113. Epub 2013 May 14.
6
mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines.mGOASVM:基于基因本体和支持向量机的多标签蛋白质亚细胞定位。
BMC Bioinformatics. 2012 Nov 6;13:290. doi: 10.1186/1471-2105-13-290.
7
Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites.用于具有单一位点和多个位点的人类蛋白质亚细胞定位预测的不平衡多模态多标签学习
PLoS One. 2012;7(6):e37155. doi: 10.1371/journal.pone.0037155. Epub 2012 Jun 8.
8
Protein subcellular localization prediction of eukaryotes using a knowledge-based approach.基于知识的真核生物蛋白质亚细胞定位预测。
BMC Bioinformatics. 2009 Dec 3;10 Suppl 15(Suppl 15):S8. doi: 10.1186/1471-2105-10-S15-S8.
9
A method to improve protein subcellular localization prediction by integrating various biological data sources.一种通过整合各种生物数据源来改进蛋白质亚细胞定位预测的方法。
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S43. doi: 10.1186/1471-2105-10-S1-S43.
10
Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species.蛋白质网络显著提高了对多种真核生物亚细胞定位的预测能力。
Nucleic Acids Res. 2008 Nov;36(20):e136. doi: 10.1093/nar/gkn619. Epub 2008 Oct 4.
用于预测膜蛋白类型的SLLE
J Theor Biol. 2005 Jan 7;232(1):7-15. doi: 10.1016/j.jtbi.2004.07.023.
4
Predicting 22 protein localizations in budding yeast.预测出芽酵母中的22种蛋白质定位。
Biochem Biophys Res Commun. 2004 Oct 15;323(2):425-8. doi: 10.1016/j.bbrc.2004.08.113.
5
Discovery of cercosporamide, a known antifungal natural product, as a selective Pkc1 kinase inhibitor through high-throughput screening.通过高通量筛选发现已知抗真菌天然产物尾孢菌素为一种选择性Pkc1激酶抑制剂。
Eukaryot Cell. 2004 Aug;3(4):932-43. doi: 10.1128/EC.3.4.932-943.2004.
6
ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST.ESLpred:基于支持向量机的方法,利用二肽组成和PSI-BLAST对真核蛋白质进行亚细胞定位。
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W414-9. doi: 10.1093/nar/gkh350.
7
Analysis of the regulation of the molecular chaperone Hsp26 by temperature-induced dissociation: the N-terminal domail is important for oligomer assembly and the binding of unfolding proteins.温度诱导解离对分子伴侣Hsp26调控的分析:N端结构域对寡聚体组装及未折叠蛋白的结合至关重要。
J Biol Chem. 2004 Mar 19;279(12):11222-8. doi: 10.1074/jbc.M310149200. Epub 2004 Jan 13.
8
Prediction of protein subcellular locations using fuzzy k-NN method.使用模糊k近邻法预测蛋白质亚细胞定位。
Bioinformatics. 2004 Jan 1;20(1):21-8. doi: 10.1093/bioinformatics/btg366.
9
Prediction and classification of protein subcellular location-sequence-order effect and pseudo amino acid composition.蛋白质亚细胞定位的预测与分类——序列顺序效应和伪氨基酸组成
J Cell Biochem. 2003 Dec 15;90(6):1250-60. doi: 10.1002/jcb.10719.
10
A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology.一种通过整合基因本体来预测蛋白质亚细胞定位的新型混合方法。
Biochem Biophys Res Commun. 2003 Nov 21;311(3):743-7. doi: 10.1016/j.bbrc.2003.10.062.