• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于马尔可夫链的贝叶斯分类器层次集成预测蛋白质亚细胞定位

Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains.

作者信息

Bulashevska Alla, Eils Roland

机构信息

Theoretical Bioinformatics Department, German Cancer Research Center, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany.

出版信息

BMC Bioinformatics. 2006 Jun 14;7:298. doi: 10.1186/1471-2105-7-298.

DOI:10.1186/1471-2105-7-298
PMID:16774677
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1525000/
Abstract

BACKGROUND

The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction.

RESULTS

A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes.

CONCLUSION

This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request.

摘要

背景

蛋白质的亚细胞定位与其功能密切相关。当仅知道蛋白质的氨基酸序列时,开发一种预测给定蛋白质亚细胞定位的方法是很有价值的。尽管已经做出了许多努力仅从序列信息预测亚细胞定位,但仍需要进一步研究以提高预测的准确性。

结果

引入了一种名为HensBC的新方法来预测蛋白质亚细胞定位。HensBC是一种递归算法,它构建了一个分类器的层次集成。所使用的分类器是基于马尔可夫链模型的贝叶斯分类器。我们在六个不同的数据集上测试了我们的方法;其中包括革兰氏阴性菌数据集、用于区分外膜蛋白的数据以及凋亡蛋白数据集。我们观察到我们的方法能够高精度地预测亚细胞定位。该方法的另一个优点是它可以提高训练中序列较少的某些类别的预测准确性,因此对于类分布不均衡的数据集很有用。

结论

本研究介绍了一种仅使用蛋白质一级序列来预测其亚细胞定位的算法。所提出的递归方案代表了一种有趣的学习和组合分类器的方法。经验结果表明,该方法计算效率高,在预测准确性方面与先前报道的方法具有竞争力。可根据要求提供软件代码。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b6a/1525000/68a447aae039/1471-2105-7-298-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b6a/1525000/145a546afbb0/1471-2105-7-298-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b6a/1525000/68a447aae039/1471-2105-7-298-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b6a/1525000/145a546afbb0/1471-2105-7-298-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b6a/1525000/68a447aae039/1471-2105-7-298-2.jpg

相似文献

1
Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains.基于马尔可夫链的贝叶斯分类器层次集成预测蛋白质亚细胞定位
BMC Bioinformatics. 2006 Jun 14;7:298. doi: 10.1186/1471-2105-7-298.
2
Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization.Hum-PLoc:一种用于预测人类蛋白质亚细胞定位的新型集成分类器。
Biochem Biophys Res Commun. 2006 Aug 18;347(1):150-7. doi: 10.1016/j.bbrc.2006.06.059. Epub 2006 Jun 21.
3
Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs.利用氨基酸组成和氨基酸对,通过支持向量机预测蛋白质亚细胞定位。
Bioinformatics. 2003 Sep 1;19(13):1656-63. doi: 10.1093/bioinformatics/btg222.
4
PairProSVM: protein subcellular localization based on local pairwise profile alignment and SVM.PairProSVM:基于局部两两轮廓比对和支持向量机的蛋白质亚细胞定位
IEEE/ACM Trans Comput Biol Bioinform. 2008 Jul-Sep;5(3):416-22. doi: 10.1109/TCBB.2007.70256.
5
Protein subcellular localization prediction based on compartment-specific features and structure conservation.基于特定区室特征和结构保守性的蛋白质亚细胞定位预测
BMC Bioinformatics. 2007 Sep 8;8:330. doi: 10.1186/1471-2105-8-330.
6
Multi-label learning with fuzzy hypergraph regularization for protein subcellular location prediction.基于模糊超图正则化的多标签学习用于蛋白质亚细胞定位预测
IEEE Trans Nanobioscience. 2014 Dec;13(4):438-47. doi: 10.1109/TNB.2014.2341111. Epub 2014 Jul 31.
7
Protein subcellular localization based on PSI-BLAST and machine learning.基于PSI-BLAST和机器学习的蛋白质亚细胞定位
J Bioinform Comput Biol. 2006 Dec;4(6):1181-95. doi: 10.1142/s0219720006002405.
8
pTARGET [corrected] a new method for predicting protein subcellular localization in eukaryotes.pTARGET [已修正] 一种预测真核生物中蛋白质亚细胞定位的新方法。
Bioinformatics. 2005 Nov 1;21(21):3963-9. doi: 10.1093/bioinformatics/bti650. Epub 2005 Sep 6.
9
Prediction of protein subcellular locations by GO-FunD-PseAA predictor.使用GO-FunD-PseAA预测器预测蛋白质亚细胞定位
Biochem Biophys Res Commun. 2004 Aug 6;320(4):1236-9. doi: 10.1016/j.bbrc.2004.06.073.
10
Use of Chou's 5-steps rule to predict the subcellular localization of gram-negative and gram-positive bacterial proteins by multi-label learning based on gene ontology annotation and profile alignment.利用 Chou 的 5 步规则,通过基于基因本体论注释和序列比对的多标签学习,预测革兰氏阴性和革兰氏阳性细菌蛋白质的亚细胞定位。
J Integr Bioinform. 2020 Jun 29;18(1):51-79. doi: 10.1515/jib-2019-0091.

引用本文的文献

1
Prediction of apoptosis protein subcellular location based on amphiphilic pseudo amino acid composition.基于两亲性伪氨基酸组成预测凋亡蛋白的亚细胞定位
Front Genet. 2023 Feb 28;14:1157021. doi: 10.3389/fgene.2023.1157021. eCollection 2023.
2
Predictions of Apoptosis Proteins by Integrating Different Features Based on Improving Pseudo-Position-Specific Scoring Matrix.基于改进的伪位置特异性评分矩阵的整合不同特征预测细胞凋亡蛋白
Biomed Res Int. 2020 Jan 14;2020:4071508. doi: 10.1155/2020/4071508. eCollection 2020.
3
Prediction of Protein Sub-Mitochondria Locations Using Protein Interaction Networks.

本文引用的文献

1
Discrimination of outer membrane proteins using support vector machines.使用支持向量机鉴别外膜蛋白。
Bioinformatics. 2005 Dec 1;21(23):4223-9. doi: 10.1093/bioinformatics/bti697. Epub 2005 Oct 4.
2
Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines.利用氨基酸子字母表和多个支持向量机组合对革兰氏阴性菌进行蛋白质亚细胞定位预测
BMC Bioinformatics. 2005 Jul 13;6:174. doi: 10.1186/1471-2105-6-174.
3
pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties.
利用蛋白质相互作用网络预测蛋白质亚线粒体定位
Iran J Biotechnol. 2018 Aug 11;16(3):e1933. doi: 10.15171/ijb.1933. eCollection 2018 Aug.
4
Prediction of subcellular location of apoptosis proteins by incorporating PsePSSM and DCCA coefficient based on LFDA dimensionality reduction.基于 LFDA 降维的 PsePSSM 和 DCCA 系数融合预测细胞凋亡蛋白的亚细胞定位。
BMC Genomics. 2018 Jun 19;19(1):478. doi: 10.1186/s12864-018-4849-9.
5
iAPSL-IF: Identification of Apoptosis Protein Subcellular Location Using Integrative Features Captured from Amino Acid Sequences.iAPSL-IF:利用从氨基酸序列中提取的综合特征识别细胞凋亡蛋白亚细胞定位。
Int J Mol Sci. 2018 Apr 13;19(4):1190. doi: 10.3390/ijms19041190.
6
Accurate prediction of subcellular location of apoptosis proteins combining Chou's PseAAC and PsePSSM based on wavelet denoising.基于小波去噪结合周氏伪氨基酸组成和伪位置特异性得分矩阵对凋亡蛋白亚细胞定位的准确预测
Oncotarget. 2017 Nov 21;8(64):107640-107665. doi: 10.18632/oncotarget.22585. eCollection 2017 Dec 8.
7
Predicting Subcellular Localization of Apoptosis Proteins Combining GO Features of Homologous Proteins and Distance Weighted KNN Classifier.结合同源蛋白的GO特征和距离加权KNN分类器预测凋亡蛋白的亚细胞定位
Biomed Res Int. 2016;2016:1793272. doi: 10.1155/2016/1793272. Epub 2016 Apr 24.
8
Prediction of protein subcellular localization by incorporating multiobjective PSO-based feature subset selection into the general form of Chou's PseAAC.通过将基于多目标粒子群优化的特征子集选择纳入周氏伪氨基酸组成的一般形式来预测蛋白质亚细胞定位
Med Biol Eng Comput. 2015 Apr;53(4):331-44. doi: 10.1007/s11517-014-1238-7. Epub 2015 Jan 7.
9
Mining Proteins with Non-Experimental Annotations Based on an Active Sample Selection Strategy for Predicting Protein Subcellular Localization.基于主动样本选择策略挖掘具有非实验注释的蛋白质以预测蛋白质亚细胞定位
PLoS One. 2013 Jun 26;8(6):e67343. doi: 10.1371/journal.pone.0067343. Print 2013.
10
Predicting the outer membrane proteome of Pasteurella multocida based on consensus prediction enhanced by results integration and manual confirmation.基于共识预测增强、结果集成和人工确认预测多杀巴斯德氏菌外膜蛋白组。
BMC Bioinformatics. 2012 Apr 27;13:63. doi: 10.1186/1471-2105-13-63.
pSLIP:基于支持向量机并利用多种物理化学性质进行蛋白质亚细胞定位预测
BMC Bioinformatics. 2005 Jun 17;6:152. doi: 10.1186/1471-2105-6-152.
4
A simple statistical method for discriminating outer membrane proteins with better accuracy.一种用于更准确地鉴别外膜蛋白的简单统计方法。
Bioinformatics. 2005 Apr 1;21(7):961-8. doi: 10.1093/bioinformatics/bti126. Epub 2004 Nov 5.
5
ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST.ESLpred:基于支持向量机的方法,利用二肽组成和PSI-BLAST对真核蛋白质进行亚细胞定位。
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W414-9. doi: 10.1093/nar/gkh350.
6
Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions.基于n肽组成的支持向量机预测革兰氏阴性菌蛋白质的亚细胞定位
Protein Sci. 2004 May;13(5):1402-6. doi: 10.1110/ps.03479604.
7
Algorithms for variable length Markov chain modeling.可变长度马尔可夫链建模算法
Bioinformatics. 2004 Mar 22;20(5):788-9. doi: 10.1093/bioinformatics/btg489. Epub 2004 Jan 29.
8
Prediction of protein subcellular locations using fuzzy k-NN method.使用模糊k近邻法预测蛋白质亚细胞定位。
Bioinformatics. 2004 Jan 1;20(1):21-8. doi: 10.1093/bioinformatics/btg366.
9
PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria.PSORT-B:改进革兰氏阴性菌蛋白质亚细胞定位预测
Nucleic Acids Res. 2003 Jul 1;31(13):3613-7. doi: 10.1093/nar/gkg602.
10
The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.2003年的SWISS-PROT蛋白质知识库及其补充TrEMBL。
Nucleic Acids Res. 2003 Jan 1;31(1):365-70. doi: 10.1093/nar/gkg095.