• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于抽象增强马尔可夫模型的蛋白质亚细胞定位半监督预测。

Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models.

机构信息

Artificial Intelligence Research Laboratory, Department of Computer Science,Iowa State University, Ames, IA 50010, USA.

出版信息

BMC Bioinformatics. 2010 Oct 26;11 Suppl 8(Suppl 8):S6. doi: 10.1186/1471-2105-11-S8-S6.

DOI:10.1186/1471-2105-11-S8-S6
PMID:21034431
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2966293/
Abstract

BACKGROUND

Determination of protein subcellular localization plays an important role in understanding protein function. Knowledge of the subcellular localization is also essential for genome annotation and drug discovery. Supervised machine learning methods for predicting the localization of a protein in a cell rely on the availability of large amounts of labeled data. However, because of the high cost and effort involved in labeling the data, the amount of labeled data is quite small compared to the amount of unlabeled data. Hence, there is a growing interest in developing semi-supervised methods for predicting protein subcellular localization from large amounts of unlabeled data together with small amounts of labeled data.

RESULTS

In this paper, we present an Abstraction Augmented Markov Model (AAMM) based approach to semi-supervised protein subcellular localization prediction problem. We investigate the effectiveness of AAMMs in exploiting unlabeled data. We compare semi-supervised AAMMs with: (i) Markov models (MMs) (which do not take advantage of unlabeled data); (ii) an expectation maximization (EM); and (iii) a co-training based approaches to semi-supervised training of MMs (that make use of unlabeled data).

CONCLUSIONS

The results of our experiments on three protein subcellular localization data sets show that semi-supervised AAMMs: (i) can effectively exploit unlabeled data; (ii) are more accurate than both the MMs and the EM based semi-supervised MMs; and (iii) are comparable in performance, and in some cases outperform, the co-training based semi-supervised MMs.

摘要

背景

确定蛋白质的亚细胞定位在理解蛋白质功能方面起着重要作用。对亚细胞定位的了解对于基因组注释和药物发现也是必不可少的。用于预测蛋白质在细胞中定位的有监督机器学习方法依赖于大量标记数据的可用性。然而,由于标记数据的成本和工作量都很高,与未标记数据相比,标记数据的数量非常少。因此,人们越来越感兴趣的是开发从大量未标记数据和少量标记数据中预测蛋白质亚细胞定位的半监督方法。

结果

本文提出了一种基于抽象增强马尔可夫模型(AAMM)的半监督蛋白质亚细胞定位预测方法。我们研究了 AAMM 在利用未标记数据方面的有效性。我们将半监督 AAMM 与以下方法进行了比较:(i)马尔可夫模型(MM)(不利用未标记数据);(ii)期望最大化(EM);以及(iii)基于协同训练的 MM 半监督训练方法(利用未标记数据)。

结论

我们在三个蛋白质亚细胞定位数据集上的实验结果表明,半监督 AAMM:(i)可以有效地利用未标记数据;(ii)比 MM 和基于 EM 的半监督 MM 更准确;(iii)在性能上相当,在某些情况下优于基于协同训练的半监督 MM。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c742/2966293/c0da6e71bf5e/1471-2105-11-S8-S6-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c742/2966293/960786e1e602/1471-2105-11-S8-S6-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c742/2966293/57728436e976/1471-2105-11-S8-S6-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c742/2966293/5254c5c0414b/1471-2105-11-S8-S6-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c742/2966293/a51046aef3f8/1471-2105-11-S8-S6-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c742/2966293/36b57f13b046/1471-2105-11-S8-S6-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c742/2966293/c0da6e71bf5e/1471-2105-11-S8-S6-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c742/2966293/960786e1e602/1471-2105-11-S8-S6-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c742/2966293/57728436e976/1471-2105-11-S8-S6-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c742/2966293/5254c5c0414b/1471-2105-11-S8-S6-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c742/2966293/a51046aef3f8/1471-2105-11-S8-S6-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c742/2966293/36b57f13b046/1471-2105-11-S8-S6-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c742/2966293/c0da6e71bf5e/1471-2105-11-S8-S6-6.jpg

相似文献

1
Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models.基于抽象增强马尔可夫模型的蛋白质亚细胞定位半监督预测。
BMC Bioinformatics. 2010 Oct 26;11 Suppl 8(Suppl 8):S6. doi: 10.1186/1471-2105-11-S8-S6.
2
Semi-supervised protein subcellular localization.半监督蛋白质亚细胞定位
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S47. doi: 10.1186/1471-2105-10-S1-S47.
3
Semi-supervised learning of Hidden Markov Models for biological sequence analysis.生物序列分析的隐马尔可夫模型的半监督学习。
Bioinformatics. 2019 Jul 1;35(13):2208-2215. doi: 10.1093/bioinformatics/bty910.
4
SemiBoost: boosting for semi-supervised learning.半增强算法:用于半监督学习的增强算法
IEEE Trans Pattern Anal Mach Intell. 2009 Nov;31(11):2000-14. doi: 10.1109/TPAMI.2008.235.
5
Multitask learning for protein subcellular location prediction.基于多任务学习的蛋白质亚细胞位置预测。
IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):748-59. doi: 10.1109/TCBB.2010.22.
6
Abstraction Augmented Markov Models.抽象增强马尔可夫模型
Proc IEEE Int Conf Data Min. 2010 Dec 13:68-77. doi: 10.1109/ICDM.2010.158.
7
Bioimaging-based detection of mislocalized proteins in human cancers by semi-supervised learning.基于生物成像的半监督学习检测人类癌症中定位错误的蛋白质
Bioinformatics. 2015 Apr 1;31(7):1111-9. doi: 10.1093/bioinformatics/btu772. Epub 2014 Nov 19.
8
Predict subcellular locations of singleplex and multiplex proteins by semi-supervised learning and dimension-reducing general mode of Chou's PseAAC.通过半监督学习和 Chou 的 PseAAC 通用模式的降维方法预测单plex 和 multiplex 蛋白质的亚细胞定位。
IEEE Trans Nanobioscience. 2013 Dec;12(4):311-20. doi: 10.1109/TNB.2013.2272014.
9
An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets.针对不平衡剪接位点数据集的基于集成的半监督学习方法的实证研究。
BMC Syst Biol. 2015;9 Suppl 5(Suppl 5):S1. doi: 10.1186/1752-0509-9-S5-S1. Epub 2015 Sep 1.
10
Multilabel learning for protein subcellular location prediction.多标签学习在蛋白质亚细胞定位预测中的应用。
IEEE Trans Nanobioscience. 2012 Sep;11(3):237-43. doi: 10.1109/TNB.2012.2212249.

引用本文的文献

1
Active semi-supervised learning for biological data classification.生物数据分类的主动半监督学习。
PLoS One. 2020 Aug 19;15(8):e0237428. doi: 10.1371/journal.pone.0237428. eCollection 2020.
2
Essential proteins and possible therapeutic targets of Wolbachia endosymbiont and development of FiloBase--a comprehensive drug target database for Lymphatic filariasis.沃尔巴克氏体共生菌的必需蛋白及可能的治疗靶点与FiloBase的开发——一个用于淋巴丝虫病的综合药物靶点数据库
Sci Rep. 2016 Jan 25;6:19842. doi: 10.1038/srep19842.
3
Determining effects of non-synonymous SNPs on protein-protein interactions using supervised and semi-supervised learning.

本文引用的文献

1
Learning accurate and concise naïve Bayes classifiers from attribute value taxonomies and data.从属性值分类法和数据中学习准确且简洁的朴素贝叶斯分类器。
Knowl Inf Syst. 2006 Feb 1;9(2):157-179. doi: 10.1007/s10115-005-0211-z.
2
Next-generation DNA sequencing techniques.下一代DNA测序技术。
N Biotechnol. 2009 Apr;25(4):195-203. doi: 10.1016/j.nbt.2008.12.009. Epub 2009 Feb 3.
3
Efficient use of unlabeled data for protein sequence classification: a comparative study.蛋白质序列分类中未标记数据的高效利用:一项比较研究。
使用监督学习和半监督学习确定非同义单核苷酸多态性对蛋白质-蛋白质相互作用的影响。
PLoS Comput Biol. 2014 May 1;10(5):e1003592. doi: 10.1371/journal.pcbi.1003592. eCollection 2014 May.
4
Protein localization prediction using random walks on graphs.基于图上随机游走的蛋白质定位预测。
BMC Bioinformatics. 2013;14 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2105-14-S8-S4. Epub 2013 May 9.
BMC Bioinformatics. 2009 Apr 29;10 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-10-S4-S2.
4
Semi-supervised protein subcellular localization.半监督蛋白质亚细胞定位
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S47. doi: 10.1186/1471-2105-10-S1-S47.
5
Semi-supervised learning for peptide identification from shotgun proteomics datasets.基于鸟枪法蛋白质组学数据集的肽段鉴定的半监督学习
Nat Methods. 2007 Nov;4(11):923-5. doi: 10.1038/nmeth1113. Epub 2007 Oct 21.
6
MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition.MultiLoc:利用N端靶向序列、序列基序和氨基酸组成预测蛋白质亚细胞定位
Bioinformatics. 2006 May 15;22(10):1158-65. doi: 10.1093/bioinformatics/btl002. Epub 2006 Jan 20.
7
Refining protein subcellular localization.优化蛋白质亚细胞定位
PLoS Comput Biol. 2005 Nov;1(6):e66. doi: 10.1371/journal.pcbi.0010066. Epub 2005 Nov 25.
8
Semi-supervised protein classification using cluster kernels.使用聚类核的半监督蛋白质分类
Bioinformatics. 2005 Aug 1;21(15):3241-7. doi: 10.1093/bioinformatics/bti497. Epub 2005 May 19.
9
Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs.利用氨基酸组成和氨基酸对,通过支持向量机预测蛋白质亚细胞定位。
Bioinformatics. 2003 Sep 1;19(13):1656-63. doi: 10.1093/bioinformatics/btg222.
10
PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria.PSORT-B:改进革兰氏阴性菌蛋白质亚细胞定位预测
Nucleic Acids Res. 2003 Jul 1;31(13):3613-7. doi: 10.1093/nar/gkg602.