Fan Rui, Suo Bing, Ding Yijie
Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.
Front Genet. 2022 Jul 13;13:960388. doi: 10.3389/fgene.2022.960388. eCollection 2022.
The prediction of protein function is a common topic in the field of bioinformatics. In recent years, advances in machine learning have inspired a growing number of algorithms for predicting protein function. A large number of parameters and fairly complex neural networks are often used to improve the prediction performance, an approach that is time-consuming and costly. In this study, we leveraged traditional features and machine learning classifiers to boost the performance of vesicle transport protein identification and make the prediction process faster. We adopt the pseudo position-specific scoring matrix (PsePSSM) feature and our proposed new classifier hypergraph regularized k-local hyperplane distance nearest neighbour (HG-HKNN) to classify vesicular transport proteins. We address dataset imbalances with random undersampling. The results show that our strategy has an area under the receiver operating characteristic curve (AUC) of 0.870 and a Matthews correlation coefficient (MCC) of 0.53 on the benchmark dataset, outperforming all state-of-the-art methods on the same dataset, and other metrics of our model are also comparable to existing methods.
蛋白质功能预测是生物信息学领域的一个常见话题。近年来,机器学习的进展激发了越来越多用于预测蛋白质功能的算法。大量参数和相当复杂的神经网络常被用于提高预测性能,这种方法既耗时又昂贵。在本研究中,我们利用传统特征和机器学习分类器来提升囊泡运输蛋白识别的性能,并使预测过程更快。我们采用伪位置特异性得分矩阵(PsePSSM)特征和我们提出的新分类器超图正则化k局部超平面距离最近邻(HG-HKNN)对囊泡运输蛋白进行分类。我们通过随机欠采样来解决数据集不平衡问题。结果表明,我们的策略在基准数据集上的受试者工作特征曲线下面积(AUC)为0.870,马修斯相关系数(MCC)为0.53,优于同一数据集上所有的现有方法,并且我们模型的其他指标也与现有方法相当。