Suppr超能文献

多标签学习在人类蛋白质亚细胞定位预测中的应用。

Multi label learning for prediction of human protein subcellular localizations.

机构信息

Institute of Image Processing & Pattern Recognition, Shanghai Jiaotong University, 800 Dongchuan Road, 200240 Shanghai, China.

出版信息

Protein J. 2009 Dec;28(9-10):384-90. doi: 10.1007/s10930-009-9205-0.

Abstract

Predicting protein subcellular locations has attracted much attention in the past decade. However, one of the most challenging problems is that many proteins were found simultaneously existing in, or moving between, two or more different cell components in a eukaryotic cell. Seldom previous predictors were able to deal with such multiplex proteins although they have extremely important implications in future drug discovery in terms of their specific subcellular targeting. Approximately 20% of the human proteome consists of such multiplex proteins with multiple sample labels. In order to efficiently handle such multiplex human proteins, we have developed a novel multi-label (ML) learning and prediction framework called ML-PLoc, which decomposes the multi-label prediction problem into multiple independent binary classification problems. ML-PLoc is constructed based on support vector machine (SVM) and sequential evolution information. Experimental results show that ML-PLoc can achieve an overall accuracy 64.6% and recall ratio 67.2% on a benchmark dataset consisting of 14 human subcellular locations, and is very powerful for dealing with multiplex proteins. The current approach represents a new strategy to deal with the multi-label biological problems. ML-PLoc software is freely available for academic use at: http://www.csbio.sjtu.edu.cn/bioinf/ML-PLoc .

摘要

在过去的十年中,预测蛋白质亚细胞定位引起了广泛关注。然而,最具挑战性的问题之一是,许多蛋白质被发现同时存在于真核细胞的两个或多个不同的细胞成分中,或者在它们之间移动。尽管这些蛋白质对于未来的药物发现具有特定的亚细胞靶向作用,具有极其重要的意义,但很少有以前的预测器能够处理这种多任务蛋白质。大约 20%的人类蛋白质组由具有多个样本标签的这种多任务蛋白质组成。为了有效地处理这种多任务人类蛋白质,我们开发了一种称为 ML-PLoc 的新型多标签(ML)学习和预测框架,它将多标签预测问题分解为多个独立的二进制分类问题。ML-PLoc 是基于支持向量机(SVM)和顺序进化信息构建的。实验结果表明,ML-PLoc 在由 14 个人类亚细胞位置组成的基准数据集上可以达到 64.6%的整体精度和 67.2%的召回率,并且非常适用于处理多任务蛋白质。目前的方法代表了处理多标签生物问题的一种新策略。ML-PLoc 软件可在学术上免费使用:http://www.csbio.sjtu.edu.cn/bioinf/ML-PLoc。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验