Environmental Biotechnology & Genomics Division, CSIR-National Environmental Engineering Research Institute (NEERI), Nehru Marg, Nagpur 440020, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India.
Environmental Biotechnology & Genomics Division, CSIR-National Environmental Engineering Research Institute (NEERI), Nehru Marg, Nagpur 440020, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India.
J Microbiol Methods. 2024 Aug;223:106953. doi: 10.1016/j.mimet.2024.106953. Epub 2024 May 14.
The microbial composition and stress molecules are main drivers influencing the development and spread of antibiotic resistance bacteria (ARBs) and genes (ARGs) in the environment. A reliable and rapid method for identifying associations between microbiome composition and resistome remains challenging. In the present study, secondary metagenome data of sewage and hospital wastewaters were assessed for differential taxonomic and ARG profiling. Subsequently, Random Forest (RF)-based ML models were used to predict ARG profiles based on taxonomic composition and model validation on hospital wastewaters. Total ARG abundance was significantly higher in hospital wastewaters (15 ppm) than sewage (5 ppm), while the resistance towards methicillin, carbapenem, and fluoroquinolone were predominant. Although, Pseudomonas constituted major fraction, Streptomyces, Enterobacter, and Klebsiella were characteristic of hospital wastewaters. Prediction modeling showed that the relative abundance of pathogenic genera Escherichia, Vibrio, and Pseudomonas contributed most towards variations in total ARG count. Moreover, the model was able to identify host-specific patterns for contributing taxa and related ARGs with >90% accuracy in predicting the ARG subtype abundance. More than >80% accuracy was obtained for hospital wastewaters, demonstrating that the model can be validly extrapolated to different types of wastewater systems. Findings from the study showed that the ML approach could identify ARG profile based on bacterial composition including 16S rDNA amplicon data, and can serve as a viable alternative to metagenomic binning for identification of potential hosts of ARGs. Overall, this study demonstrates the promising application of ML techniques for predicting the spread of ARGs and provides guidance for early warning of ARBs emergence.
微生物组成和应激分子是影响环境中抗生素耐药细菌 (ARB) 和基因 (ARGs) 发展和传播的主要驱动因素。一种可靠且快速的方法来识别微生物组组成和抗药性之间的关联仍然具有挑战性。在本研究中,评估了污水和医院废水的二级宏基因组数据,以进行分类和 ARG 差异分析。随后,使用基于随机森林 (RF) 的 ML 模型根据分类组成预测 ARG 图谱,并在医院废水中进行模型验证。医院废水中的总 ARG 丰度(15 ppm)明显高于污水(5 ppm),而对甲氧西林、碳青霉烯和氟喹诺酮的耐药性则占主导地位。尽管假单胞菌构成了主要部分,但肠杆菌科的肠杆菌、克雷伯氏菌和链霉菌是医院废水的特征。预测模型表明,致病性属大肠杆菌、弧菌和假单胞菌的相对丰度对总 ARG 计数的变化贡献最大。此外,该模型能够识别宿主特异性模式,对于分类群和相关 ARGs,其预测 ARG 亚型丰度的准确率>90%。对于医院废水,准确率>80%,表明该模型可以有效地外推到不同类型的废水系统。该研究的结果表明,ML 方法可以根据细菌组成(包括 16S rDNA 扩增子数据)识别 ARG 图谱,并可作为宏基因组 binning 的可行替代方法,用于鉴定 ARGs 的潜在宿主。总的来说,这项研究表明了 ML 技术在预测 ARGs 传播方面的应用前景,并为 ARB 出现的早期预警提供了指导。