Chen Yu, Li Sai, Guo Jifeng
College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.
Front Genet. 2022 Aug 15;13:963349. doi: 10.3389/fgene.2022.963349. eCollection 2022.
Moonlighting proteins have at least two independent functions and are widely found in animals, plants and microorganisms. Moonlighting proteins play important roles in signal transduction, cell growth and movement, tumor inhibition, DNA synthesis and repair, and metabolism of biological macromolecules. Moonlighting proteins are difficult to find through biological experiments, so many researchers identify moonlighting proteins through bioinformatics methods, but their accuracies are relatively low. Therefore, we propose a new method. In this study, we select SVMProt-188D as the feature input, and apply a model combining linear discriminant analysis and basic classifiers in machine learning to study moonlighting proteins, and perform bagging ensemble on the best-performing support vector machine. They are identified accurately and efficiently. The model achieves an accuracy of 93.26% and an F-sorce of 0.946 on the MPFit dataset, which is better than the existing MEL-MP model. Meanwhile, it also achieves good results on the other two moonlighting protein datasets.
兼性蛋白质具有至少两种独立功能,广泛存在于动物、植物和微生物中。兼性蛋白质在信号转导、细胞生长与运动、肿瘤抑制、DNA合成与修复以及生物大分子代谢中发挥重要作用。兼性蛋白质难以通过生物学实验发现,因此许多研究人员通过生物信息学方法识别兼性蛋白质,但其准确性相对较低。因此,我们提出了一种新方法。在本研究中,我们选择SVMProt-188D作为特征输入,并应用机器学习中线性判别分析与基本分类器相结合的模型来研究兼性蛋白质,并对性能最佳的支持向量机进行装袋集成。它们被准确高效地识别出来。该模型在MPFit数据集上的准确率达到93.26%,F值为0.946,优于现有的MEL-MP模型。同时,它在其他两个兼性蛋白质数据集上也取得了良好的结果。