Balamurugan Muthukumar, Banerjee Ruma, Kasibhatla Sunitha Manjari, Achalere Archana, Joshi Rajendra
HPC-Medical and Bioinformatics Applications Group, Centre for Development of Advanced Computing, Innovation Park, Pune, India.
Front Genet. 2022 Apr 13;13:800083. doi: 10.3389/fgene.2022.800083. eCollection 2022.
A total of two lineages of var. (), L5 and L6, which are members of the complex (MTBC), are responsible for causing tuberculosis in West Africa. Regions of difference (RDs) are usually used for delineation of MTBC. With increased data availability, single nucleotide polymorphisms (SNPs) promise to provide better resolution. Publicly available 380 samples were analyzed for identification of while additional 270 samples were used for validation. RD-based methods were used for lineage-assignment, wherein 31 samples remained unidentified. The genetic diversity of was estimated based on genome-wide SNPs using phylogeny and population genomics approaches. Lineage-based clustering (L5 and L6) was observed in the whole genome phylogeny with distinct sub-clusters. Population stratification using both model-based and approaches supported the same observations. L6 was further delineated into three sub-lineages (L6.1-L6.3), whereas L5 was grouped as L5.1 and L5.2 based on the occurrence of RD711. L5.1 and L5.2 were further divided into two (L5.1.1 and L5.1.2) and four (L5.2.1-L5.2.4) sub-clusters, respectively. Unassigned samples could be assigned to definite lineages/sub-lineages based on clustering observed in phylogeny along with high-confidence posterior membership scores obtained during population stratification. Based on the (sub)-clusters delineated, "" were derived. Synonymous SNPs (137 in L5 and 128 in L6) were identified as biomarkers and used for validation. Few of the cluster-specific missense variants in L5 and L6 belong to the central carbohydrate metabolism pathway which include His6Tyr (Rv0946c), Glu255Ala (Rv1131), Ala309Gly (Rv2454c), Val425Ala and Ser112Ala (Rv1127c), Gly198Ala (Rv3293) and Ile137Val (Rv0363c), Thr421Ala (Rv0896), Arg442His (Rv1248c), Thr218Ile (Rv1122), and Ser381Leu (Rv1449c), hinting at the differential growth attenuation. Genes harboring multiple (sub)-lineage-specific SNPs such as Lys117Asn, Val447Met, and Ala455Val (Rv0066c; ) present across L6, L6.1, and L5, respectively, hinting at the association of these SNPs with selective advantage or host-adaptation. Cluster-specific SNPs serve as additional markers along with RD-regions for delineation. The identified SNPs have the potential to provide insights into the genotype-phenotype correlation and clues for endemicity of in the African population.
结核分枝杆菌(MTBC)的两个谱系,即L5和L6,是造成西非结核病的病原体。差异区域(RDs)通常用于MTBC的划分。随着数据可得性的增加,单核苷酸多态性(SNPs)有望提供更高的分辨率。对公开可用的380份结核分枝杆菌样本进行分析以鉴定谱系,另外270份样本用于验证。基于RD的方法用于谱系分配,其中31份样本仍无法鉴定。利用系统发育和群体基因组学方法,基于全基因组SNPs估计结核分枝杆菌的遗传多样性。在全基因组系统发育中观察到基于谱系的聚类(L5和L6)以及不同的亚聚类。使用基于模型和非模型的方法进行群体分层均支持相同的观察结果。L6进一步划分为三个亚谱系(L6.1 - L6.3),而基于RD711的出现情况,L5被分为L5.1和L5.2。L5.1和L5.2又分别进一步分为两个亚聚类(L5.1.1和L5.1.2)和四个亚聚类(L5.2.1 - L5.2.4)。未分配的样本可根据系统发育中观察到的聚类以及群体分层期间获得的高置信度后验成员分数分配到确定的谱系/亚谱系。基于划分出的(亚)聚类,得出了“结核分枝杆菌群体结构”。同义SNPs(L5中有137个,L6中有128个)被鉴定为生物标志物并用于验证。L5和L6中少数聚类特异性错义变体属于中心碳水化合物代谢途径,包括His6Tyr(Rv0946c)、Glu255Ala(Rv1131)、Ala309Gly(Rv2454c)、Val425Ala和Ser112Ala(Rv1127c)、Gly198Ala(Rv3293)和Ile137Val(Rv0363c)、Thr421Ala(Rv0896)、Arg442His(Rv1248c)、Thr218Ile(Rv1122)以及Ser381Leu(Rv1449c),这暗示了不同的生长衰减情况。分别在L6、L6.1和L5中存在的携带多个(亚)谱系特异性SNPs的基因,如Lys117Asn、Val447Met和Ala455Val(Rv0066c;“某种基因”,此处原文未明确给出完整基因名称),暗示了这些SNPs与选择优势或宿主适应性的关联。聚类特异性SNPs与RD区域一起可作为结核分枝杆菌划分的额外标记。所鉴定的SNPs有可能为基因型 - 表型相关性提供见解,并为非洲人群中结核分枝杆菌的地方性流行提供线索。