The Interdisciplinary PhD Program in Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, VA, 24061, USA.
Department of Computer Science, Virginia Tech, Blacksburg, VA, 24061, USA.
Microbiome. 2019 Aug 29;7(1):123. doi: 10.1186/s40168-019-0735-1.
The interconnectivities of built and natural environments can serve as conduits for the proliferation and dissemination of antibiotic resistance genes (ARGs). Several studies have compared the broad spectrum of ARGs (i.e., "resistomes") in various environmental compartments, but there is a need to identify unique ARG occurrence patterns (i.e., "discriminatory ARGs"), characteristic of each environment. Such an approach will help to identify factors influencing ARG proliferation, facilitate development of relative comparisons of the ARGs distinguishing various environments, and help pave the way towards ranking environments based on their likelihood of contributing to the spread of clinically relevant antibiotic resistance. Here we formulate and demonstrate an approach using an extremely randomized tree (ERT) algorithm combined with a Bayesian optimization technique to capture ARG variability in environmental samples and identify the discriminatory ARGs. The potential of ERT for identifying discriminatory ARGs was first evaluated using in silico metagenomic datasets (simulated metagenomic Illumina sequencing data) with known variability. The application of ERT was then demonstrated through analyses using publicly available and in-house metagenomic datasets associated with (1) different aquatic habitats (e.g., river, wastewater influent, hospital effluent, and dairy farm effluent) to compare resistomes between distinct environments and (2) different river samples (i.e., Amazon, Kalamas, and Cam Rivers) to compare resistome characteristics of similar environments.
The approach was found to readily identify discriminatory ARGs in the in silico datasets. Also, it was not found to be biased towards ARGs with high relative abundance, which is a common limitation of feature projection methods, and instead only captured those ARGs that elicited significant profiles. Analyses of publicly available metagenomic datasets further demonstrated that the ERT approach can effectively differentiate real-world environmental samples and identify discriminatory ARGs based on pre-defined categorizing schemes.
Here a new methodology was formulated to characterize and compare variances in ARG profiles between metagenomic data sets derived from similar/dissimilar environments. Specifically, identification of discriminatory ARGs among samples representing various environments can be identified based on factors of interest. The methodology could prove to be a particularly useful tool for ARG surveillance and the assessment of the effectiveness of strategies for mitigating the spread of antibiotic resistance. The python package is hosted in the Git repository: https://github.com/gaarangoa/ExtrARG.
建筑和自然环境的相互联系可以作为抗生素耐药基因(ARG)增殖和传播的途径。有几项研究比较了各种环境介质中广谱的 ARG(即“抗性组”),但需要确定每个环境特有的 ARG 发生模式(即“鉴别性 ARG”)。这种方法将有助于确定影响 ARG 增殖的因素,促进对区分各种环境的 ARG 进行相对比较,并有助于根据其对临床相关抗生素耐药性传播的贡献程度对环境进行排序。在这里,我们提出并展示了一种使用极端随机树(ERT)算法结合贝叶斯优化技术的方法,以捕捉环境样本中的 ARG 变异性并识别鉴别性 ARG。首先,使用具有已知变异性的计算机模拟宏基因组数据集(模拟宏基因组 Illumina 测序数据)评估 ERT 识别鉴别性 ARG 的潜力。然后,通过使用与(1)不同水生生境(如河流、废水进水、医院污水和奶牛场污水)相关的公开可用和内部宏基因组数据集以及(2)不同河流样本(即亚马逊河、卡拉马什河和 Cam 河)进行分析,展示了 ERT 的应用,以比较不同环境之间的抗性组和类似环境的抗性组特征。
该方法被发现能够轻易地在计算机模拟数据集识别出鉴别性 ARG。此外,它不会偏向于相对丰度较高的 ARG,这是特征投影方法的常见限制,而是仅捕获那些引起显著谱的 ARG。对公开可用的宏基因组数据集的分析进一步证明,ERT 方法可以有效地区分真实世界的环境样本,并根据预定义的分类方案识别鉴别性 ARG。
在这里,我们提出了一种新的方法来描述和比较从相似/不同环境中获得的宏基因组数据集中的 ARG 谱差异。具体来说,可以根据感兴趣的因素,在代表各种环境的样本中识别出鉴别性 ARG。该方法可能成为 ARG 监测和评估减轻抗生素耐药性传播策略有效性的有用工具。该 Python 包托管在 Git 存储库中:https://github.com/gaarangoa/ExtrARG。