Ioerger Thomas R, Shatby Anthony
Department of Computer Science and Engineering, Texas A&M University.
bioRxiv. 2025 May 20:2025.05.07.652684. doi: 10.1101/2025.05.07.652684.
Multiple studies have reported genes in the (Mtb) genome that are under diversifying selection, based on genetic variants among Mtb clinical isolates. These might reflect adaptions to selection pressures associated with modern clinical treatment of TB. Many, but not all, of these genes under selection are related to drug resistance. Most of these studies have evaluated selection at the gene-level. However, positive selection can be evaluated on different scales, including individual sites (codons) and local regions within an ORF. In this paper, we use GenomegaMap, a Bayesian method for calculating selection, to evaluate selection of genes in the Mtb genome at all three levels. We present evidence that the intermediate analysis (windows of codons) provides the most credible list of candidate genes under selection. A further advantage of this approach is that it identifies specific regions within proteins that are under selective pressure, which is useful for structural and functional interpretation. In an analysis of two separate collections of Mtb clinical isolates (from Moldova; and a globally-representative set), we observed 53 and 178 significant genes under selection, with 36% overlap. The lists of genes under selection include many drug-resistance genes, as well as other genes that have previously been reported to be under selection (). The specific regions under selection identified within drug-resistance genes are shown to correspond to protein structural features known to be involved in resistance, supporting accuracy of the method. Positive selection in several ESX-1-related genes was also observed, suggesting adaptation to immune pressure.
多项研究报告称,基于结核分枝杆菌(Mtb)临床分离株之间的基因变异,Mtb基因组中的一些基因正经历多样化选择。这些变异可能反映了对与现代结核病临床治疗相关的选择压力的适应。在这些处于选择中的基因中,许多(但不是全部)与耐药性有关。大多数此类研究都是在基因水平上评估选择情况。然而,正选择可以在不同尺度上进行评估,包括单个位点(密码子)和开放阅读框(ORF)内的局部区域。在本文中,我们使用GenomegaMap(一种用于计算选择的贝叶斯方法)在所有三个层面评估Mtb基因组中基因的选择情况。我们提供的证据表明,中间分析(密码子窗口)提供了处于选择中的最可靠候选基因列表。这种方法的另一个优点是,它能识别出处于选择压力下的蛋白质中的特定区域,这对于结构和功能解释很有用。在对两组不同的Mtb临床分离株(来自摩尔多瓦;以及一组具有全球代表性的样本)进行的分析中,我们分别观察到53个和178个处于选择中的显著基因,其中有36%的重叠。处于选择中的基因列表包括许多耐药基因,以及其他先前已报道处于选择中的基因。在耐药基因中确定的处于选择中的特定区域与已知参与耐药的蛋白质结构特征相对应,这支持了该方法的准确性。我们还观察到几个与ESX-1相关的基因存在正选择,这表明对免疫压力的适应。