Zárate Alida, Díaz-González Lorena, Taboada Blanca
Doctorado en Ciencias, Instituto de Investigación en Ciencias Básicas Aplicadas (IICBA), Universidad Autónoma del Estado de Morelos, Cuernavaca, Morelos 62210, México.
Centro de Investigación en Ciencias, Universidad Autónoma del Estado de Morelos, Cuernavaca, Morelos 62210, México.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf001.
This study addresses the challenging task of identifying viruses within metagenomic data, which encompasses a broad array of biological samples, including animal reservoirs, environmental sources, and the human body. Traditional methods for virus identification often face limitations due to the diversity and rapid evolution of viral genomes. In response, recent efforts have focused on leveraging artificial intelligence (AI) techniques to enhance accuracy and efficiency in virus detection. However, existing AI-based approaches are primarily binary classifiers, lacking specificity in identifying viral types and reliant on nucleotide sequences. To address these limitations, VirDetect-AI, a novel tool specifically designed for the identification of eukaryotic viruses within metagenomic datasets, is introduced. The VirDetect-AI model employs a combination of convolutional neural networks and residual neural networks to effectively extract hierarchical features and detailed patterns from complex amino acid genomic data. The results demonstrated that the model has outstanding results in all metrics, with a sensitivity of 0.97, a precision of 0.98, and an F1-score of 0.98. VirDetect-AI improves our comprehension of viral ecology and can accurately classify metagenomic sequences into 980 viral protein classes, hence enabling the identification of new viruses. These classes encompass an extensive array of viral genera and families, as well as protein functions and hosts.
本研究致力于解决在宏基因组数据中识别病毒这一具有挑战性的任务,宏基因组数据涵盖了广泛的生物样本,包括动物宿主、环境来源和人体。由于病毒基因组的多样性和快速进化,传统的病毒识别方法往往面临局限性。作为回应,最近的努力集中在利用人工智能(AI)技术来提高病毒检测的准确性和效率。然而,现有的基于AI的方法主要是二元分类器,在识别病毒类型方面缺乏特异性,并且依赖于核苷酸序列。为了解决这些局限性,我们引入了VirDetect-AI,这是一种专门设计用于在宏基因组数据集中识别真核病毒的新型工具。VirDetect-AI模型采用卷积神经网络和残差神经网络的组合,从复杂的氨基酸基因组数据中有效地提取层次特征和详细模式。结果表明,该模型在所有指标上都取得了出色的成绩,灵敏度为0.97,精确率为0.98,F1分数为0.98。VirDetect-AI提高了我们对病毒生态学的理解,并能将宏基因组序列准确分类为980个病毒蛋白类别,从而能够识别新病毒。这些类别涵盖了广泛的病毒属和科,以及蛋白质功能和宿主。