State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong, Pokfulam, Hong Kong SAR, China.
Joint Institute of Virology (Shantou University and The University of Hong Kong), Guangdong-Hongkong Joint Laboratory of Emerging Infectious Diseases, Shantou University, Shantou, Guangdong, 515063, China.
Microbiome. 2024 May 9;12(1):84. doi: 10.1186/s40168-024-01805-0.
Emergence of antibiotic resistance in bacteria is an important threat to global health. Antibiotic resistance genes (ARGs) are some of the key components to define bacterial resistance and their spread in different environments. Identification of ARGs, particularly from high-throughput sequencing data of the specimens, is the state-of-the-art method for comprehensively monitoring their spread and evolution. Current computational methods to identify ARGs mainly rely on alignment-based sequence similarities with known ARGs. Such approaches are limited by choice of reference databases and may potentially miss novel ARGs. The similarity thresholds are usually simple and could not accommodate variations across different gene families and regions. It is also difficult to scale up when sequence data are increasing.
In this study, we developed ARGNet, a deep neural network that incorporates an unsupervised learning autoencoder model to identify ARGs and a multiclass classification convolutional neural network to classify ARGs that do not depend on sequence alignment. This approach enables a more efficient discovery of both known and novel ARGs. ARGNet accepts both amino acid and nucleotide sequences of variable lengths, from partial (30-50 aa; 100-150 nt) sequences to full-length protein or genes, allowing its application in both target sequencing and metagenomic sequencing. Our performance evaluation showed that ARGNet outperformed other deep learning models including DeepARG and HMD-ARG in most of the application scenarios especially quasi-negative test and the analysis of prediction consistency with phylogenetic tree. ARGNet has a reduced inference runtime by up to 57% relative to DeepARG.
ARGNet is flexible, efficient, and accurate at predicting a broad range of ARGs from the sequencing data. ARGNet is freely available at https://github.com/id-bioinfo/ARGNet , with an online service provided at https://ARGNet.hku.hk . Video Abstract.
细菌对抗生素耐药性的出现是对全球健康的重要威胁。抗生素耐药基因(ARGs)是定义细菌耐药性及其在不同环境中传播的关键组成部分之一。鉴定 ARGs,特别是从标本的高通量测序数据中鉴定 ARGs,是全面监测其传播和进化的最新方法。目前用于鉴定 ARGs 的计算方法主要依赖于与已知 ARGs 的基于序列比对的相似性。这些方法受到参考数据库选择的限制,并且可能潜在地错过新的 ARGs。相似性阈值通常很简单,无法适应不同基因家族和区域的变化。当序列数据增加时,也很难扩展。
在这项研究中,我们开发了 ARGNet,这是一种深度神经网络,它结合了一个无监督学习自动编码器模型,用于识别 ARGs,以及一个多类分类卷积神经网络,用于分类不依赖于序列比对的 ARGs。这种方法能够更有效地发现已知和新的 ARGs。ARGNet 接受氨基酸和核苷酸的可变长度序列,从部分(30-50 个氨基酸;100-150 个核苷酸)序列到全长蛋白质或基因,允许其在靶向测序和宏基因组测序中应用。我们的性能评估表明,ARGNet 在大多数应用场景中表现优于其他深度学习模型,包括 DeepARG 和 HMD-ARG,特别是在准阴性测试和与系统发育树的预测一致性分析方面。与 DeepARG 相比,ARGNet 的推断运行时间减少了多达 57%。
ARGNet 灵活、高效且能准确地从测序数据中预测广泛的 ARGs。ARGNet 可在 https://github.com/id-bioinfo/ARGNet 上免费获取,在线服务可在 https://ARGNet.hku.hk 上获取。视频摘要。