通过生成对抗网络和机器学习技术的结合来预测病毒致癌蛋白。

Prediction of viral oncoproteins through the combination of generative adversarial networks and machine learning techniques.

机构信息

Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile.

Departamento de Ciencias Básicas, Facultad de Ciencias, Universidad Santo Tomas, Temuco, Chile.

出版信息

Sci Rep. 2024 Nov 7;14(1):27108. doi: 10.1038/s41598-024-77028-y.

DOI:10.1038/s41598-024-77028-y

PMID:39511292

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11543823/

Abstract

Viral oncoproteins play crucial roles in transforming normal cells into cancer cells, representing a significant factor in the etiology of various cancers. Traditionally, identifying these oncoproteins is both time-consuming and costly. With advancements in computational biology, bioinformatics tools based on machine learning have emerged as effective methods for predicting biological activities. Here, for the first time, we propose an innovative approach that combines Generative Adversarial Networks (GANs) with supervised learning methods to enhance the accuracy and generalizability of viral oncoprotein prediction. Our methodology evaluated multiple machine learning models, including Random Forest, Multilayer Perceptron, Light Gradient Boosting Machine, eXtreme Gradient Boosting, and Support Vector Machine. In ten-fold cross-validation on our training dataset, the GAN-enhanced Random Forest model demonstrated superior performance metrics: 0.976 accuracy, 0.976 F1 score, 0.977 precision, 0.976 sensitivity, and 1.0 AUC. During independent testing, this model achieved 0.982 accuracy, 0.982 F1 score, 0.982 precision, 0.982 sensitivity, and 1.0 AUC. These results establish our new tool, VirOncoTarget, accessible via a web application. We anticipate that VirOncoTarget will be a valuable resource for researchers, enabling rapid and reliable viral oncoprotein prediction and advancing our understanding of their role in cancer biology.

摘要

病毒癌蛋白在将正常细胞转化为癌细胞方面发挥着关键作用，是各种癌症病因学中的一个重要因素。传统上，鉴定这些癌蛋白既耗时又昂贵。随着计算生物学的进步，基于机器学习的生物信息学工具已经成为预测生物活性的有效方法。在这里，我们首次提出了一种创新的方法，将生成对抗网络 (GAN) 与监督学习方法相结合，以提高病毒癌蛋白预测的准确性和泛化能力。我们的方法评估了多种机器学习模型，包括随机森林、多层感知机、轻梯度提升机、极端梯度提升和支持向量机。在我们的训练数据集上进行的 10 倍交叉验证中，GAN 增强的随机森林模型表现出优越的性能指标：0.976 准确率、0.976 F1 分数、0.977 精度、0.976 敏感性和 1.0 AUC。在独立测试中，该模型实现了 0.982 的准确率、0.982 的 F1 分数、0.982 的精度、0.982 的敏感性和 1.0 AUC。这些结果建立了我们的新工具 VirOncoTarget，可通过网络应用程序访问。我们预计 VirOncoTarget 将成为研究人员的宝贵资源，能够快速可靠地预测病毒癌蛋白，并推进我们对它们在癌症生物学中的作用的理解。