College of Mathematics and Physics, Qingdao University of Science and Technology, China.
School of Mathematics and Statistics, Central South University, China.
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab012.
Multi-label proteins can participate in carrier transportation, enzyme catalysis, hormone regulation and other life activities. Meanwhile, they play a key role in the fields of biopharmaceuticals, gene and cell therapy. This article proposes a prediction method called Mps-mvRBRL to predict the subcellular localization (SCL) of multi-label protein. Firstly, pseudo position-specific scoring matrix, dipeptide composition, position specific scoring matrix-transition probability composition, gene ontology and pseudo amino acid composition algorithms are used to obtain numerical information from different views. Based on the contribution of five individual feature extraction methods, differential evolution is used for the first time to learn the weight of single feature, and then these original features use a weighted combination method to fuse multi-view information. Secondly, the fused high-dimensional features use a weighted linear discriminant analysis framework based on binary weight form to eliminate irrelevant information. Finally, the best feature vector is input into the joint ranking support vector machine and binary relevance with robust low-rank learning classifier to predict the SCL. After applying leave-one-out cross-validation, the overall actual accuracy (OAA) and overall location accuracy (OLA) of Mps-mvRBRL on the training set of Gram-positive bacteria are both 99.81%. The OAA on the test sets of plant, virus and Gram-negative bacteria datasets are 97.24%, 98.55% and 98.20%, respectively, and the OLA are 97.16%, 97.62% and 98.28%, respectively. The results show that the model achieves good prediction performance for predicting the SCL of multi-label protein.
多标签蛋白可以参与载体运输、酶催化、激素调节等生命活动。同时,它们在生物制药、基因和细胞治疗等领域发挥着关键作用。本文提出了一种名为 Mps-mvRBRL 的预测方法,用于预测多标签蛋白的亚细胞定位(SCL)。首先,从不同视角利用伪位置特异性评分矩阵、二肽组成、位置特异性评分矩阵-转移概率组成、基因本体和伪氨基酸组成算法获取数值信息。基于五种个体特征提取方法的贡献,首次利用差分进化算法学习单个特征的权重,然后使用加权组合方法融合多视图信息。其次,融合后的高维特征采用基于二进制权重形式的加权线性判别分析框架,消除无关信息。最后,将最佳特征向量输入联合排序支持向量机和二进制相关性鲁棒低秩学习分类器,以预测 SCL。应用留一交叉验证后,Mps-mvRBRL 在革兰氏阳性菌训练集上的总体实际准确率(OAA)和总体定位准确率(OLA)均为 99.81%。在植物、病毒和革兰氏阴性菌数据集的测试集上,OAA 分别为 97.24%、98.55%和 98.20%,OLA 分别为 97.16%、97.62%和 98.28%。结果表明,该模型在预测多标签蛋白的 SCL 方面取得了良好的预测性能。