• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于植物蛋白质亚细胞定位多标签分类的多个分类器集成

Ensemble of Multiple Classifiers for Multilabel Classification of Plant Protein Subcellular Localization.

作者信息

Wattanapornprom Warin, Thammarongtham Chinae, Hongsthong Apiradee, Lertampaiporn Supatcha

机构信息

Applied Computer Science Program, Department of Mathematics, Faculty of Science, King Mongkut's University of Technology Thonburi, Bangkok 10140, Thailand.

Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut's University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand.

出版信息

Life (Basel). 2021 Mar 30;11(4):293. doi: 10.3390/life11040293.

DOI:10.3390/life11040293
PMID:33808227
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8066735/
Abstract

The accurate prediction of protein localization is a critical step in any functional genome annotation process. This paper proposes an improved strategy for protein subcellular localization prediction in plants based on multiple classifiers, to improve prediction results in terms of both accuracy and reliability. The prediction of plant protein subcellular localization is challenging because the underlying problem is not only a multiclass, but also a multilabel problem. Generally, plant proteins can be found in 10-14 locations/compartments. The number of proteins in some compartments (nucleus, cytoplasm, and mitochondria) is generally much greater than that in other compartments (vacuole, peroxisome, Golgi, and cell wall). Therefore, the problem of imbalanced data usually arises. Therefore, we propose an ensemble machine learning method based on average voting among heterogeneous classifiers. We first extracted various types of features suitable for each type of protein localization to form a total of 479 feature spaces. Then, feature selection methods were used to reduce the dimensions of the features into smaller informative feature subsets. This reduced feature subset was then used to train/build three different individual models. In the process of combining the three distinct classifier models, we used an average voting approach to combine the results of these three different classifiers that we constructed to return the final probability prediction. The method could predict subcellular localizations in both single- and multilabel locations, based on the voting probability. Experimental results indicated that the proposed ensemble method could achieve correct classification with an overall accuracy of 84.58% for 11 compartments, on the basis of the testing dataset.

摘要

准确预测蛋白质定位是任何功能基因组注释过程中的关键步骤。本文提出了一种基于多个分类器的改进策略,用于预测植物蛋白质的亚细胞定位,以提高预测结果的准确性和可靠性。植物蛋白质亚细胞定位的预测具有挑战性,因为潜在问题不仅是一个多类问题,也是一个多标签问题。一般来说,植物蛋白质可以在10 - 14个位置/区室中找到。某些区室(细胞核、细胞质和线粒体)中的蛋白质数量通常比其他区室(液泡、过氧化物酶体、高尔基体和细胞壁)中的多得多。因此,通常会出现数据不平衡的问题。因此,我们提出了一种基于异构分类器之间平均投票的集成机器学习方法。我们首先提取了适合每种蛋白质定位类型的各种特征,形成了总共479个特征空间。然后,使用特征选择方法将特征维度缩减为更小的信息丰富的特征子集。然后,使用这个缩减后的特征子集来训练/构建三个不同的个体模型。在组合这三个不同的分类器模型的过程中,我们使用平均投票方法来组合我们构建的这三个不同分类器的结果,以返回最终的概率预测。该方法可以基于投票概率预测单标签和多标签位置的亚细胞定位。实验结果表明,所提出的集成方法在测试数据集的基础上,对于11个区室能够以84.58%的总体准确率实现正确分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21c6/8066735/542173cd49ab/life-11-00293-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21c6/8066735/16c8f1ea67a2/life-11-00293-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21c6/8066735/542173cd49ab/life-11-00293-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21c6/8066735/16c8f1ea67a2/life-11-00293-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21c6/8066735/542173cd49ab/life-11-00293-g002.jpg

相似文献

1
Ensemble of Multiple Classifiers for Multilabel Classification of Plant Protein Subcellular Localization.用于植物蛋白质亚细胞定位多标签分类的多个分类器集成
Life (Basel). 2021 Mar 30;11(4):293. doi: 10.3390/life11040293.
2
Minimalist ensemble algorithms for genome-wide protein localization prediction.基因组范围内蛋白质定位预测的简约集成算法。
BMC Bioinformatics. 2012 Jul 3;13:157. doi: 10.1186/1471-2105-13-157.
3
Ensemble of heterogeneous classifiers for diagnosis and prediction of coronary artery disease with reduced feature subset.用于冠状动脉疾病诊断和预测的具有简化特征子集的异构分类器集成
Comput Methods Programs Biomed. 2021 Jan;198:105770. doi: 10.1016/j.cmpb.2020.105770. Epub 2020 Sep 30.
4
CE-PLoc: an ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition.CE-PLoc:一种通过融合不同模式的伪氨基酸组成来预测蛋白质亚细胞位置的集成分类器。
Comput Biol Chem. 2011 Aug 10;35(4):218-29. doi: 10.1016/j.compbiolchem.2011.05.003. Epub 2011 May 27.
5
Predicting protein subcellular location by fusing multiple classifiers.通过融合多个分类器预测蛋白质亚细胞定位。
J Cell Biochem. 2006 Oct 1;99(2):517-27. doi: 10.1002/jcb.20879.
6
An ensemble method for predicting subnuclear localizations from primary protein structures.一种基于原始蛋白质结构预测亚核定位的集成方法。
PLoS One. 2013;8(2):e57225. doi: 10.1371/journal.pone.0057225. Epub 2013 Feb 27.
7
Gene ontology based transfer learning for protein subcellular localization.基于基因本体论的蛋白质亚细胞定位迁移学习。
BMC Bioinformatics. 2011 Feb 2;12:44. doi: 10.1186/1471-2105-12-44.
8
MultiP-Apo: A Multilabel Predictor for Identifying Subcellular Locations of Apoptosis Proteins.MultiP-Apo:一种用于识别凋亡蛋白亚细胞定位的多标签预测器。
Comput Intell Neurosci. 2017;2017:9183796. doi: 10.1155/2017/9183796. Epub 2017 Jul 4.
9
Boosting accuracy of automated classification of fluorescence microscope images for location proteomics.提高用于定位蛋白质组学的荧光显微镜图像自动分类的准确性。
BMC Bioinformatics. 2004 Jun 18;5:78. doi: 10.1186/1471-2105-5-78.
10
A Machine Learning Ensemble Classifier for Early Prediction of Diabetic Retinopathy.机器学习集成分类器在糖尿病视网膜病变早期预测中的应用。
J Med Syst. 2017 Nov 9;41(12):201. doi: 10.1007/s10916-017-0853-x.

引用本文的文献

1
The synthesis of triacylglycerol by diacylglycerol acyltransferases (CsDGAT1A and CsDGAT2D) is essential for tolerance of cucumber's resistance to low-temperature stress.二酰甘油酰基转移酶(CsDGAT1A 和 CsDGAT2D)合成三酰基甘油对于黄瓜耐低温胁迫的抗性至关重要。
Plant Cell Rep. 2024 Jul 16;43(8):196. doi: 10.1007/s00299-024-03282-z.
2
PlasmidEC and gplas2: an optimized short-read approach to predict and reconstruct antibiotic resistance plasmids in .质粒EC和gplas2:一种优化的短读长方法,用于预测和重建……中的抗生素抗性质粒
Microb Genom. 2024 Feb;10(2). doi: 10.1099/mgen.0.001193.
3
Genome-Wide Identification of Strawberry C2H2-ZFP C1-2i Subclass and the Potential Function of in Abiotic Stress.

本文引用的文献

1
Bird Eye View of Protein Subcellular Localization Prediction.蛋白质亚细胞定位预测鸟瞰图
Life (Basel). 2020 Dec 14;10(12):347. doi: 10.3390/life10120347.
2
Tools for the Recognition of Sorting Signals and the Prediction of Subcellular Localization of Proteins From Their Amino Acid Sequences.用于识别分选信号以及根据氨基酸序列预测蛋白质亚细胞定位的工具。
Front Genet. 2020 Nov 25;11:607812. doi: 10.3389/fgene.2020.607812. eCollection 2020.
3
Membrane Trafficking and Subcellular Drug Targeting Pathways.膜运输与亚细胞药物靶向途径
草莓 C2H2-ZFP C1-2i 亚类的全基因组鉴定及其在非生物胁迫中的潜在功能。
Int J Mol Sci. 2022 Oct 28;23(21):13079. doi: 10.3390/ijms232113079.
4
Recent Advances in the Prediction of Subcellular Localization of Proteins and Related Topics.蛋白质亚细胞定位预测及相关主题的最新进展
Front Bioinform. 2022 May 19;2:910531. doi: 10.3389/fbinf.2022.910531. eCollection 2022.
Front Pharmacol. 2020 May 27;11:629. doi: 10.3389/fphar.2020.00629. eCollection 2020.
4
Plant-mSubP: a computational framework for the prediction of single- and multi-target protein subcellular localization using integrated machine-learning approaches.植物微小肽:一种使用集成机器学习方法预测单靶点和多靶点蛋白质亚细胞定位的计算框架。
AoB Plants. 2019 Oct 17;12(3):plz068. doi: 10.1093/aobpla/plz068. eCollection 2020 Jun.
5
Subcellular location prediction of apoptosis proteins using two novel feature extraction methods based on evolutionary information and LDA.基于进化信息和 LDA 的两种新特征提取方法对凋亡蛋白的亚细胞定位预测
BMC Bioinformatics. 2020 May 24;21(1):212. doi: 10.1186/s12859-020-3539-1.
6
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.马修斯相关系数(MCC)在二分类评估中优于 F1 得分和准确率的优势。
BMC Genomics. 2020 Jan 2;21(1):6. doi: 10.1186/s12864-019-6413-7.
7
Protein sequence information extraction and subcellular localization prediction with gapped k-Mer method.使用缺口 k-Mer 方法进行蛋白质序列信息提取和亚细胞定位预测。
BMC Bioinformatics. 2019 Dec 30;20(Suppl 22):719. doi: 10.1186/s12859-019-3232-4.
8
PSO-LocBact: A Consensus Method for Optimizing Multiple Classifier Results for Predicting the Subcellular Localization of Bacterial Proteins.PSO-LocBact:一种用于优化细菌蛋白质亚细胞定位预测的多分类器结果的共识方法。
Biomed Res Int. 2019 Nov 19;2019:5617153. doi: 10.1155/2019/5617153. eCollection 2019.
9
MIC_Locator: a novel image-based protein subcellular location multi-label prediction model based on multi-scale monogenic signal representation and intensity encoding strategy.MIC_Locator:一种新颖的基于图像的蛋白质亚细胞位置多标签预测模型,基于多尺度单基因信号表示和强度编码策略。
BMC Bioinformatics. 2019 Oct 26;20(1):522. doi: 10.1186/s12859-019-3136-3.
10
Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features.基于多视图特征融合的蛋白质亚细胞定位预测。
Molecules. 2019 Mar 6;24(5):919. doi: 10.3390/molecules24050919.