Zhou Yang, Anthony Richard, Wang Shengfen, Xia Hui, Ou Xichao, Zhao Bing, Song Yuanyuan, Zheng Yang, He Ping, Liu Dongxin, Zhao Yanlin, van Soolingen Dick
National Center for Tuberculosis Control and Prevention, Chinese Center for Disease Control and Prevention, Changping District, Beijing, China.
Radboudumc Research Institute, Radboud University, Houtlaan XZ, Nijmegen, The Netherlands.
PLoS One. 2025 May 19;20(5):e0324152. doi: 10.1371/journal.pone.0324152. eCollection 2025.
Tuberculosis is a major public health threat resulting in more than one million lives lost every year. Many challenges exist to defeat this deadly infectious disease which address the importance of a thorough understanding of the biology of the causative agent Mycobacterium tuberculosis (MTB). We generated a non-redundant pangenome of 420 epidemic MTB strains from China including 344 Lineage 2 strains, 69 Lineage 4 strains, six Lineage 3 strains, and one Lineage 1 strain. We estimate that MTB strains have a pangenome of 4,278 genes encoding 4,183 proteins, of which 3,438 are core genes. However, due to 99,694 interruptions in 2,447 coding genes, we can only confidently confirm 1,651 of these genes are translated in all samples. Of these interruptions, 67,315 (67.52%) could be classified by various genetic variations detected by currently available tools, and more than half of them are due to structural variations, mostly small indels. Assuming a proportion of these interruptions are artifacts, the number of active core genes would still be much lower than 3,438. We further described differential evolutionary patterns of genes under the influences of selective pressure, population structure and purifying selection. While selective pressure is ubiquitous among these coding genes, evolutionary adaptations are concentrated in 1,310 genes. Genes involved in cell wall biogenesis are under the strongest selective pressure, while the biological process of disruption of host organelles indicates the direction of the most intensive positive selection. This study provides a comprehensive view on the genetic diversity and evolutionary patterns of coding genes in MTB which may deepen our understanding of its epidemiology and pathogenicity.
结核病是一项重大的公共卫生威胁,每年导致超过100万人丧生。要战胜这种致命的传染病存在诸多挑战,这凸显了深入了解病原体结核分枝杆菌(MTB)生物学特性的重要性。我们构建了一个来自中国的420株流行MTB菌株的非冗余泛基因组,其中包括344株2型菌株、69株4型菌株、6株3型菌株和1株1型菌株。我们估计MTB菌株的泛基因组有4278个编码4183种蛋白质的基因,其中3438个是核心基因。然而,由于2447个编码基因中有99694个中断,我们只能确定其中1651个基因在所有样本中都能翻译。在这些中断中,67315个(67.52%)可通过现有工具检测到的各种基因变异进行分类,其中一半以上是由于结构变异,主要是小的插入和缺失。假设这些中断中有一部分是人为因素造成的,活跃核心基因的数量仍将远低于3438个。我们进一步描述了在选择压力、种群结构和纯化选择影响下基因的差异进化模式。虽然选择压力在这些编码基因中普遍存在,但进化适应集中在1310个基因中。参与细胞壁生物合成的基因受到的选择压力最强,而破坏宿主细胞器的生物学过程则表明了最强烈正选择的方向。这项研究提供了关于MTB编码基因遗传多样性和进化模式的全面观点,可能会加深我们对其流行病学和致病性的理解。
Nat Protoc. 2024-11
Genome Biol Evol. 2024-4-2
Front Microbiol. 2022-4-14
Antioxidants (Basel). 2021-8-13