Zhang Yanqin, Li Zhiyuan
School of Finance, Xuzhou University of Technology, Xuzhou, China.
School of Artificial Intelligence and Software College, Jiangsu Normal University Kewen College, Xuzhou, China.
Front Genet. 2023 Feb 8;13:1103783. doi: 10.3389/fgene.2022.1103783. eCollection 2022.
Phages play essential roles in biological procession, and the virion proteins encoded by the phage genome constitute critical elements of the assembled phage particle. This study uses machine learning methods to classify phage virion proteins. We proposed a novel approach, RF_phage virion, for the effective classification of the virion and non-virion proteins. The model uses four protein sequence coding methods as features, and the random forest algorithm was employed to solve the classification problem. The performance of the RF_phage virion model was analyzed by comparing the performance of this algorithm with that of classical machine learning methods. The proposed method achieved a specificity (Sp) of 93.37%%, sensitivity (Sn) of 90.30%, accuracy (Acc) of 91.84%, Matthews correlation coefficient (MCC) of .8371, and an F1 score of .9196.
噬菌体在生物过程中发挥着重要作用,噬菌体基因组编码的病毒粒子蛋白构成了组装好的噬菌体颗粒的关键元件。本研究使用机器学习方法对噬菌体病毒粒子蛋白进行分类。我们提出了一种名为RF_phage virion的新方法,用于有效区分病毒粒子蛋白和非病毒粒子蛋白。该模型使用四种蛋白质序列编码方法作为特征,并采用随机森林算法解决分类问题。通过将该算法与经典机器学习方法的性能进行比较,分析了RF_phage virion模型的性能。所提出的方法特异性(Sp)达到93.37%,灵敏度(Sn)为90.30%,准确率(Acc)为91.84%,马修斯相关系数(MCC)为0.8371,F1分数为0.9196。