Key Laboratory of Embedded System and Service Computing, Ministry of Education, Department of Control Science and Engineering, Tongji University, Shanghai 201804, China.
IEEE/ACM Trans Comput Biol Bioinform. 2013 Mar-Apr;10(2):436-46. doi: 10.1109/TCBB.2013.21.
Prediction of protein subcellular localization is an important but challenging problem, particularly when proteins may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular localization methods are only used to deal with the single-location proteins. In the past few years, only a few methods have been proposed to tackle proteins with multiple locations. However, they only adopt a simple strategy, that is, transforming the multilocation proteins to multiple proteins with single location, which does not take correlations among different subcellular locations into account. In this paper, a novel method named random label selection (RALS) (multilabel learning via RALS), which extends the simple binary relevance (BR) method, is proposed to learn from multilocation proteins in an effective and efficient way. RALS does not explicitly find the correlations among labels, but rather implicitly attempts to learn the label correlations from data by augmenting original feature space with randomly selected labels as its additional input features. Through the fivefold cross-validation test on a benchmark data set, we demonstrate our proposed method with consideration of label correlations obviously outperforms the baseline BR method without consideration of label correlations, indicating correlations among different subcellular locations really exist and contribute to improvement of prediction performance. Experimental results on two benchmark data sets also show that our proposed methods achieve significantly higher performance than some other state-of-the-art methods in predicting subcellular multilocations of proteins. The prediction web server is available at >http://levis.tongji.edu.cn:8080/bioinfo/MLPred-Euk/ for the public usage.
蛋白质亚细胞定位预测是一个重要但具有挑战性的问题,特别是当蛋白质可能同时存在于或在两个或更多不同的亚细胞位置之间移动时。大多数现有的蛋白质亚细胞定位方法仅用于处理单定位蛋白质。在过去的几年中,只有少数几种方法被提出用于处理多定位蛋白质。然而,它们仅采用一种简单的策略,即将多定位蛋白质转换为具有单个位置的多个蛋白质,而不考虑不同亚细胞位置之间的相关性。在本文中,提出了一种名为随机标签选择(RALS)(通过 RALS 进行多标签学习)的新方法,该方法扩展了简单的二分类相关性(BR)方法,以有效地从多定位蛋白质中学习。RALS 并没有显式地寻找标签之间的相关性,而是通过用随机选择的标签作为其附加输入特征来扩充原始特征空间,从而从数据中尝试学习标签相关性。通过在基准数据集上进行五重交叉验证测试,我们证明了我们提出的考虑标签相关性的方法明显优于不考虑标签相关性的基线 BR 方法,表明不同亚细胞位置之间确实存在相关性,并有助于提高预测性能。在两个基准数据集上的实验结果还表明,我们提出的方法在预测蛋白质的亚细胞多定位方面明显优于其他一些最先进的方法。预测网络服务器可在 >http://levis.tongji.edu.cn:8080/bioinfo/MLPred-Euk/ 上供公众使用。