Suppr超能文献

基于多任务学习的蛋白质亚细胞位置预测。

Multitask learning for protein subcellular location prediction.

机构信息

Bioengineering Program, Hong Kong University of Science and Technology, Clearwater Bay, Kowloon, Hong Kong.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):748-59. doi: 10.1109/TCBB.2010.22.

Abstract

Protein subcellular localization is concerned with predicting the location of a protein within a cell using computational methods. The location information can indicate key functionalities of proteins. Thus, accurate prediction of subcellular localizations of proteins can help the prediction of protein functions and genome annotations, as well as the identification of drug targets. Machine learning methods such as Support Vector Machines (SVMs) have been used in the past for the problem of protein subcellular localization, but have been shown to suffer from a lack of annotated training data in each species under study. To overcome this data sparsity problem, we observe that because some of the organisms may be related to each other, there may be some commonalities across different organisms that can be discovered and used to help boost the data in each localization task. In this paper, we formulate protein subcellular localization problem as one of multitask learning across different organisms. We adapt and compare two specializations of the multitask learning algorithms on 20 different organisms. Our experimental results show that multitask learning performs much better than the traditional single-task methods. Among the different multitask learning methods, we found that the multitask kernels and supertype kernels under multitask learning that share parameters perform slightly better than multitask learning by sharing latent features. The most significant improvement in terms of localization accuracy is about 25 percent. We find that if the organisms are very different or are remotely related from a biological point of view, then jointly training the multiple models cannot lead to significant improvement. However, if they are closely related biologically, the multitask learning can do much better than individual learning.

摘要

蛋白质亚细胞定位是指使用计算方法预测蛋白质在细胞内的位置。位置信息可以指示蛋白质的关键功能。因此,准确预测蛋白质的亚细胞定位可以帮助预测蛋白质功能和基因组注释,以及识别药物靶点。支持向量机(Support Vector Machines,SVMs)等机器学习方法过去曾用于蛋白质亚细胞定位问题,但在研究的每个物种中,由于缺乏标注的训练数据,其效果受到了限制。为了克服这个数据稀疏问题,我们观察到,由于一些生物体可能相互关联,因此可能存在不同生物体之间的一些共性,可以发现并利用这些共性来帮助每个定位任务的数据扩充。在本文中,我们将蛋白质亚细胞定位问题表述为跨不同生物体的多任务学习问题。我们针对 20 个不同的生物体,对多任务学习算法的两个专门化版本进行了适配和比较。实验结果表明,多任务学习的表现明显优于传统的单任务方法。在不同的多任务学习方法中,我们发现多任务核和多任务学习下共享参数的超类型核的性能略优于共享潜在特征的多任务学习。在定位准确性方面,最大的改进约为 25%。我们发现,如果生物体在生物学上非常不同或关系较远,那么联合训练多个模型并不能带来显著的改进。但是,如果它们在生物学上密切相关,多任务学习可以比单个学习做得更好。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验