Department of Epidemiology and Biostatistics, College of Public Health, University of Georgia, Athens, GA, 30602, United States.
Comput Biol Med. 2024 Dec;183:109218. doi: 10.1016/j.compbiomed.2024.109218. Epub 2024 Oct 5.
Traditionally, the classification of HIV-1 M group subtypes has depended on statistical methods constrained by sample sizes. Here HIV-1-M-SPBEnv was proposed as the first deep learning-based method for classifying HIV-1 M group subtypes via env gene sequences. This approach overcomes sample size challenges by utilizing artificial molecular evolution techniques to generate a synthetic dataset suitable for machine learning. Employing a convolutional Autoencoder embedded with two residual blocks and two transpose residual blocks, followed by a full connected neural network block, HIV-1-M-SPBEnv simplifies complex, high-dimensional DNA sequence data into concise, information-rich, low-dimensional representations, achieving exceptional classification accuracy. Through independent data set validation, the precision, accuracy, recall and F1 score of the HIV-1-M-SPBEnv model predictions were all 100 %, confirming its capability to accurately identify all 12 subtypes of the HIV-1 M group. Deployed through a web server, it provides seamless HIV-1 M group subtype prediction capabilities for researchers and clinicians. HIV-1-M-SPBEnv web server is accessible at http://www.hivsubclass.com and all the code is available at https://github.com/pengsihua2023/HIV-1-M-SPBEnv.
传统上,HIV-1 M 组亚型的分类依赖于受样本大小限制的统计方法。在这里,我们提出了 HIV-1-M-SPBEnv,这是一种基于深度学习的方法,通过 env 基因序列对 HIV-1 M 组亚型进行分类。该方法通过利用人工分子进化技术生成适合机器学习的合成数据集,克服了样本大小的挑战。HIV-1-M-SPBEnv 采用卷积自动编码器,嵌入两个残差块和两个转置残差块,然后是一个全连接神经网络块,将复杂的高维 DNA 序列数据简化为简洁的、信息丰富的低维表示,实现了出色的分类准确性。通过独立数据集验证,HIV-1-M-SPBEnv 模型预测的精度、准确性、召回率和 F1 得分均达到 100%,证实了其能够准确识别 HIV-1 M 组的所有 12 种亚型的能力。通过 Web 服务器部署,它为研究人员和临床医生提供了无缝的 HIV-1 M 组亚型预测功能。HIV-1-M-SPBEnv Web 服务器可在 http://www.hivsubclass.com 访问,所有代码可在 https://github.com/pengsihua2023/HIV-1-M-SPBEnv 获得。