School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Yaguan Road, Jinnan District, Tianjin, PR China.
School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Yaguan Road, Jinnan District, Tianjin, PR China; School of Computational Science and Engineering, University of South Carolina, Columbia, USA.
J Theor Biol. 2019 Feb 7;462:230-239. doi: 10.1016/j.jtbi.2018.11.012. Epub 2018 Nov 16.
Identifying the location of proteins in a cell plays an important role in understanding their functions, such as drug design, therapeutic target discovery and biological research. However, the traditional subcellular localization experiments are time-consuming, laborious and small scale. With the development of next-generation sequencing technology, the number of proteins has grown exponentially, which lays the foundation of the computational method for identifying protein subcellular localization. Although many methods for predicting subcellular localization of proteins have been proposed, most of them are limited to single-location. In this paper, we propose a multi-kernel SVM to predict subcellular localization of both multi-location and single-location proteins. First, we make use of the evolutionary information extracted from position specific scoring matrix (PSSM) and physicochemical properties of proteins, by Chou's general PseAAC and other efficient functions. Then, we propose a multi-kernel support vector machine (SVM) model to identify multi-label protein subcellular localization. As a result, our method has a good performance on predicting subcellular localization of proteins. It achieves an average precision of 0.7065 and 0.6889 on two human datasets, respectively. All results are higher than those achieved by other existing methods. Therefore, we provide an efficient system via a novel perspective to study the protein subcellular localization.
确定蛋白质在细胞中的位置在理解其功能方面起着重要作用,例如药物设计、治疗靶点发现和生物研究。然而,传统的亚细胞定位实验既耗时又费力,而且规模较小。随着下一代测序技术的发展,蛋白质的数量呈指数级增长,这为蛋白质亚细胞定位的计算方法奠定了基础。尽管已经提出了许多预测蛋白质亚细胞定位的方法,但大多数方法仅限于单定位。在本文中,我们提出了一种多核支持向量机(Multi-kernel SVM),用于预测多定位和单定位蛋白质的亚细胞定位。首先,我们利用从位置特异性评分矩阵(PSSM)和蛋白质理化性质中提取的进化信息,通过 Chou 的通用 PseAAC 和其他高效功能。然后,我们提出了一种多核支持向量机(SVM)模型来识别多标签蛋白质亚细胞定位。结果表明,我们的方法在预测蛋白质亚细胞定位方面具有良好的性能。在两个人类数据集上,分别实现了 0.7065 和 0.6889 的平均精度。所有结果均高于其他现有方法的结果。因此,我们通过一种新的视角提供了一个有效的系统来研究蛋白质亚细胞定位。