Wan Shixiang, Duan Yucong, Zou Quan
School of Computer Science and Technology, Tianjin University, Tianjin, P. R. China.
State Key Laboratory of Marine Resource Utilization in the South China Sea, College of Information and Technology, Hainan University, Haikou, Hainan, P. R. China.
Proteomics. 2017 Sep;17(17-18). doi: 10.1002/pmic.201700262.
Predicting the subcellular localization of proteins is an important and challenging problem. Traditional experimental approaches are often expensive and time-consuming. Consequently, a growing number of research efforts employ a series of machine learning approaches to predict the subcellular location of proteins. There are two main challenges among the state-of-the-art prediction methods. First, most of the existing techniques are designed to deal with multi-class rather than multi-label classification, which ignores connections between multiple labels. In reality, multiple locations of particular proteins imply that there are vital and unique biological significances that deserve special focus and cannot be ignored. Second, techniques for handling imbalanced data in multi-label classification problems are necessary, but never employed. For solving these two issues, we have developed an ensemble multi-label classifier called HPSLPred, which can be applied for multi-label classification with an imbalanced protein source. For convenience, a user-friendly webserver has been established at http://server.malab.cn/HPSLPred.
预测蛋白质的亚细胞定位是一个重要且具有挑战性的问题。传统的实验方法通常既昂贵又耗时。因此,越来越多的研究工作采用一系列机器学习方法来预测蛋白质的亚细胞定位。在当前最先进的预测方法中存在两个主要挑战。首先,现有的大多数技术旨在处理多类别而非多标签分类,这忽略了多个标签之间的联系。实际上,特定蛋白质的多个定位意味着存在至关重要且独特的生物学意义,值得特别关注且不能被忽视。其次,处理多标签分类问题中不平衡数据的技术是必要的,但从未被采用。为了解决这两个问题,我们开发了一种名为HPSLPred的集成多标签分类器,它可应用于具有不平衡蛋白质来源的多标签分类。为方便起见,已在http://server.malab.cn/HPSLPred建立了一个用户友好的网络服务器。