Computer Science program, School of Information Technology and Computer Science (ITCS), Nile University, Sheikh Zayed City, Egypt.
Biomedical Engineering Department, Faculty of Engineering, Helwan University, Cairo, Egypt.
Sci Rep. 2024 Jul 5;14(1):15463. doi: 10.1038/s41598-024-64209-y.
Hepatitis C virus (HCV) is a major global health concern, affecting millions of individuals worldwide. While existing literature predominantly focuses on disease classification using clinical data, there exists a critical research gap concerning HCV genotyping based on genomic sequences. Accurate HCV genotyping is essential for patient management and treatment decisions. While the neural models excel at capturing complex patterns, they still face challenges, such as data scarcity, that exist a lot in computational genomics. To overcome this challenges, this paper introduces an advanced deep learning approach for HCV genotyping based on the graphical representation of nucleotide sequences that outperforms classical approaches. Notably, it is effective for both partial and complete HCV genomes and addresses challenges associated with imbalanced datasets. In this work, ten HCV genotypes: 1a, 1b, 2a, 2b, 2c, 3a, 3b, 4, 5, and 6 were used in the analysis. This study utilizes Chaos Game Representation for 2D mapping of genomic sequences, employing self-supervised learning using convolutional autoencoder for deep feature extraction, resulting in an outstanding performance for HCV genotyping compared to various machine learning and deep learning models. This baseline provides a benchmark against which the performance of the proposed approach and other models can be evaluated. The experimental results showcase a remarkable classification accuracy of over 99%, outperforming traditional deep learning models. This performance demonstrates the capability of the proposed model to accurately identify HCV genotypes in both partial and complete sequences and in dealing with data scarcity for certain genotypes. The results of the proposed model are compared to NCBI genotyping tool.
丙型肝炎病毒(HCV)是一个全球性的健康关注点,影响着全球数以百万计的个体。虽然现有文献主要侧重于使用临床数据进行疾病分类,但在基于基因组序列的 HCV 基因分型方面存在着重要的研究空白。准确的 HCV 基因分型对于患者管理和治疗决策至关重要。尽管神经模型擅长捕捉复杂模式,但它们仍然面临挑战,例如在计算基因组学中普遍存在的数据稀缺问题。为了克服这些挑战,本文提出了一种基于核苷酸序列图形表示的 HCV 基因分型的先进深度学习方法,该方法优于经典方法。值得注意的是,它对部分和完整的 HCV 基因组都有效,并解决了与不平衡数据集相关的挑战。在这项工作中,使用了十种 HCV 基因型:1a、1b、2a、2b、2c、3a、3b、4、5 和 6 进行分析。本研究利用混沌游戏表示法对基因组序列进行 2D 映射,使用卷积自动编码器进行自监督学习进行深度特征提取,与各种机器学习和深度学习模型相比,HCV 基因分型的性能表现出色。该基线提供了一个基准,可以根据该基准评估所提出方法和其他模型的性能。实验结果展示了超过 99%的出色分类准确性,优于传统的深度学习模型。这一性能表明,所提出的模型能够准确识别部分和完整序列中的 HCV 基因型,并处理某些基因型的数据稀缺问题。所提出模型的结果与 NCBI 基因分型工具进行了比较。