Legall Noah, Salvador Liliana C M
Interdisciplinary Disease Ecology Across Scales Research Traineeship Program, University of Georgia, Athens, GA, United States.
Institute of Bioinformatics, University of Georgia, Athens, GA, United States.
Front Microbiol. 2022 Sep 7;13:787856. doi: 10.3389/fmicb.2022.787856. eCollection 2022.
, a bacterial zoonotic pathogen responsible for the economically and agriculturally important livestock disease bovine tuberculosis (bTB), infects a broad mammalian host range worldwide. This characteristic has led to bidirectional transmission events between livestock and wildlife species as well as the formation of wildlife reservoirs, impacting the success of bTB control measures. Next Generation Sequencing (NGS) has transformed our ability to understand disease transmission events by tracking variant sites, however the genomic signatures related to host adaptation following spillover, alongside the role of other genomic factors in the transmission process are understudied problems. We analyzed publicly available datasets collected from 700 hosts across three countries with bTB endemic regions (United Kingdom, United States, and New Zealand) to investigate if genomic regions with high SNP density and/or selective sweep sites play a role in adaptation to new environments (e.g., at the host-species, geographical, and/or sub-population levels). A simulated alignment was created to generate null distributions for defining genomic regions with high SNP counts and regions with selective sweeps evidence. Random Forest (RF) models were used to investigate evolutionary metrics within the genomic regions of interest to determine which genomic processes were the best for classifying across ecological scales. We identified in the bovis genomes 14 and 132 high SNP density and selective sweep regions, respectively. Selective sweep regions were ranked as the most important in classifying across the different scales in all RF models. SNP dense regions were found to have high importance in the badger and cattle specific RF models in classifying badger derived isolates from livestock derived ones. Additionally, the genes detected within these genomic regions harbor various pathogenic functions such as virulence and immunogenicity, membrane structure, host survival, and mycobactin production. The results of this study demonstrate how comparative genomics alongside machine learning approaches are useful to investigate further the nature of host-pathogen interactions.
牛分枝杆菌是一种细菌性人畜共患病原体,可引发对经济和农业具有重要意义的家畜疾病——牛结核病(bTB),在全球范围内感染多种哺乳动物宿主。这一特性导致了家畜与野生动物物种之间的双向传播事件以及野生动物宿主库的形成,影响了牛结核病控制措施的成效。新一代测序(NGS)通过追踪变异位点改变了我们理解疾病传播事件的能力,然而,溢出后与宿主适应性相关的基因组特征以及其他基因组因素在传播过程中的作用仍是研究不足的问题。我们分析了从三个牛结核病流行地区(英国、美国和新西兰)的700个宿主收集的公开数据集,以研究具有高单核苷酸多态性(SNP)密度和/或选择性清除位点的基因组区域是否在适应新环境(例如在宿主物种、地理和/或亚种群水平)中发挥作用。创建了一个模拟比对来生成用于定义具有高SNP计数的基因组区域和具有选择性清除证据的区域的零分布。随机森林(RF)模型用于研究感兴趣的基因组区域内的进化指标,以确定哪些基因组过程最适合跨生态尺度进行分类。我们在牛分枝杆菌基因组中分别鉴定出14个高SNP密度区域和132个选择性清除区域。在所有RF模型中,选择性清除区域在跨不同尺度分类时被列为最重要的区域。在区分獾源分离株和家畜源分离株的獾和牛特异性RF模型中,发现SNP密集区域具有很高的重要性。此外,在这些基因组区域内检测到的基因具有多种致病功能,如毒力和免疫原性、膜结构、宿主存活和分枝杆菌素产生。本研究结果表明,比较基因组学与机器学习方法相结合有助于进一步研究宿主-病原体相互作用的本质。