基于多任务学习的蛋白质亚细胞位置预测。

Multitask learning for protein subcellular location prediction.

机构信息

Bioengineering Program, Hong Kong University of Science and Technology, Clearwater Bay, Kowloon, Hong Kong.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):748-59. doi: 10.1109/TCBB.2010.22.

DOI:10.1109/TCBB.2010.22

PMID:20421687

Abstract

Protein subcellular localization is concerned with predicting the location of a protein within a cell using computational methods. The location information can indicate key functionalities of proteins. Thus, accurate prediction of subcellular localizations of proteins can help the prediction of protein functions and genome annotations, as well as the identification of drug targets. Machine learning methods such as Support Vector Machines (SVMs) have been used in the past for the problem of protein subcellular localization, but have been shown to suffer from a lack of annotated training data in each species under study. To overcome this data sparsity problem, we observe that because some of the organisms may be related to each other, there may be some commonalities across different organisms that can be discovered and used to help boost the data in each localization task. In this paper, we formulate protein subcellular localization problem as one of multitask learning across different organisms. We adapt and compare two specializations of the multitask learning algorithms on 20 different organisms. Our experimental results show that multitask learning performs much better than the traditional single-task methods. Among the different multitask learning methods, we found that the multitask kernels and supertype kernels under multitask learning that share parameters perform slightly better than multitask learning by sharing latent features. The most significant improvement in terms of localization accuracy is about 25 percent. We find that if the organisms are very different or are remotely related from a biological point of view, then jointly training the multiple models cannot lead to significant improvement. However, if they are closely related biologically, the multitask learning can do much better than individual learning.

摘要

蛋白质亚细胞定位是指使用计算方法预测蛋白质在细胞内的位置。位置信息可以指示蛋白质的关键功能。因此，准确预测蛋白质的亚细胞定位可以帮助预测蛋白质功能和基因组注释，以及识别药物靶点。支持向量机（Support Vector Machines，SVMs）等机器学习方法过去曾用于蛋白质亚细胞定位问题，但在研究的每个物种中，由于缺乏标注的训练数据，其效果受到了限制。为了克服这个数据稀疏问题，我们观察到，由于一些生物体可能相互关联，因此可能存在不同生物体之间的一些共性，可以发现并利用这些共性来帮助每个定位任务的数据扩充。在本文中，我们将蛋白质亚细胞定位问题表述为跨不同生物体的多任务学习问题。我们针对 20 个不同的生物体，对多任务学习算法的两个专门化版本进行了适配和比较。实验结果表明，多任务学习的表现明显优于传统的单任务方法。在不同的多任务学习方法中，我们发现多任务核和多任务学习下共享参数的超类型核的性能略优于共享潜在特征的多任务学习。在定位准确性方面，最大的改进约为 25%。我们发现，如果生物体在生物学上非常不同或关系较远，那么联合训练多个模型并不能带来显著的改进。但是，如果它们在生物学上密切相关，多任务学习可以比单个学习做得更好。

相似文献

Multitask learning for protein subcellular location prediction.

IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):748-59. doi: 10.1109/TCBB.2010.22.

Multilabel learning for protein subcellular location prediction.

IEEE Trans Nanobioscience. 2012 Sep;11(3):237-43. doi: 10.1109/TNB.2012.2212249.

Semi-supervised protein subcellular localization.

BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S47. doi: 10.1186/1471-2105-10-S1-S47.

SubCellProt: predicting protein subcellular localization using machine learning approaches.

In Silico Biol. 2009;9(1-2):35-44.

Prediction of protein subcellular localization.

Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018.

Significantly improved prediction of subcellular localization by integrating text and protein sequence data.

Pac Symp Biocomput. 2006:16-27.

Protein subcellular localization prediction using artificial intelligence technology.

Methods Mol Biol. 2008;484:435-63. doi: 10.1007/978-1-59745-398-1_27.

Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features.

IEEE Trans Nanobioscience. 2012 Dec;11(4):375-85. doi: 10.1109/TNB.2012.2208473. Epub 2012 Aug 3.

Going from where to why--interpretable prediction of protein subcellular localization.

Bioinformatics. 2010 May 1;26(9):1232-8. doi: 10.1093/bioinformatics/btq115. Epub 2010 Mar 17.

Protein subcellular localization prediction using multiple kernel learning based support vector machine.

Mol Biosyst. 2017 Mar 28;13(4):785-795. doi: 10.1039/c6mb00860g.

引用本文的文献

A primer on the use of machine learning to distil knowledge from data in biological psychiatry.

Mol Psychiatry. 2024 Feb;29(2):387-401. doi: 10.1038/s41380-023-02334-2. Epub 2024 Jan 4.

Advancing translational research in neuroscience through multi-task learning.

Front Psychiatry. 2022 Nov 17;13:993289. doi: 10.3389/fpsyt.2022.993289. eCollection 2022.

Pan-cancer classification by regularized multi-task learning.

Sci Rep. 2021 Dec 20;11(1):24252. doi: 10.1038/s41598-021-03554-8.

Comparative Evaluation of Machine Learning Strategies for Analyzing Big Data in Psychiatry.

Int J Mol Sci. 2018 Oct 29;19(11):3387. doi: 10.3390/ijms19113387.

Knowledge-transfer learning for prediction of matrix metalloprotease substrate-cleavage sites.

Sci Rep. 2017 Jul 18;7(1):5755. doi: 10.1038/s41598-017-06219-7.

An ensemble method approach to investigate kinase-specific phosphorylation sites.

Int J Nanomedicine. 2014 May 10;9:2225-39. doi: 10.2147/IJN.S57526. eCollection 2014.

Mining Proteins with Non-Experimental Annotations Based on an Active Sample Selection Strategy for Predicting Protein Subcellular Localization.

PLoS One. 2013 Jun 26;8(6):e67343. doi: 10.1371/journal.pone.0067343. Print 2013.

Automated protein subcellular localization based on local invariant features.

Protein J. 2013 Mar;32(3):230-7. doi: 10.1007/s10930-013-9478-1.

Using multitask classification methods to investigate the kinase-specific phosphorylation sites.

Proteome Sci. 2012 Jun 21;10 Suppl 1(Suppl 1):S7. doi: 10.1186/1477-5956-10-S1-S7.

An ensemble classifier for eukaryotic protein subcellular location prediction using gene ontology categories and amino acid hydrophobicity.

PLoS One. 2012;7(1):e31057. doi: 10.1371/journal.pone.0031057. Epub 2012 Jan 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于多任务学习的蛋白质亚细胞位置预测。

Multitask learning for protein subcellular location prediction.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献