Alland David, Lacher David W, Hazbón Manzour Hernando, Motiwala Alifiya S, Qi Weihong, Fleischmann Robert D, Whittam Thomas S
Division of Infectious Disease, Department of Medicine, University of Medicine and Dentistry of New Jersey, Newark, NJ 07103, USA.
J Clin Microbiol. 2007 Jan;45(1):39-46. doi: 10.1128/JCM.02483-05. Epub 2006 Nov 1.
Mycobacterium tuberculosis strains contain different genomic insertions or deletions called large sequence polymorphisms (LSPs). Distinguishing between LSPs that occur one time versus ones that occur repeatedly in a genomic region may provide insights into the biological roles of LSPs and identify useful phylogenetic markers. We analyzed 163 clinical M. tuberculosis isolates for 17 LSPs identified in a genomic comparison of M. tuberculosis strains H37Rv and CDC1551. LSPs were mapped onto a single-nucleotide polymorphism (SNP)-based phylogenetic tree created using nine novel SNP markers that were found to reproduce a 212-SNP-based phylogeny. Four LSPs (group A) mapped to a single SNP tree segment. Two LSPs (group B) and 11 LSPs (group C) were inferred to have arisen independently in the same genomic region either two or more than two times, respectively. None of the group A LSPs but one group B LSP and five group C LSPs were flanked by IS6110 sequences in the references strains. Genes encoding members of the proline-glutamic acid or proline-proline-glutamic acid protein families were present only in group B or C LSPs. SNP- versus LSP-based phylogenies were also compared. We classified each isolate into 58 LSP types by using a separate LSP-based phylogenetic analysis and mapped the LSP types onto the SNP tree. LSPs often assigned isolates to the correct phylogenetic lineage; however, significant mistakes occurred for 6/58 (10%) of the LSP types. In conclusion, most LSPs occur in genomic regions that are prone to repeated insertion/deletion events and were responsible for an unexpectedly high degree of genomic variation in clinical M. tuberculosis. Group B and C LSPs may represent polymorphisms that occur due to selective pressure and affect the phenotype of the organism, while group A LSPs are preferable phylogenetic markers.
结核分枝杆菌菌株含有不同的基因组插入或缺失,称为大序列多态性(LSPs)。区分在基因组区域中只出现一次的LSPs和反复出现的LSPs,可能有助于深入了解LSPs的生物学作用,并识别有用的系统发育标记。我们分析了163株临床结核分枝杆菌分离株中的17个LSPs,这些LSPs是在结核分枝杆菌菌株H37Rv和CDC1551的基因组比较中鉴定出来的。LSPs被映射到一个基于单核苷酸多态性(SNP)的系统发育树上,该树是使用9个新的SNP标记创建的,发现这些标记能够重现基于212个SNP的系统发育。4个LSPs(A组)映射到单个SNP树段。推断有2个LSPs(B组)和11个LSPs(C组)分别在同一基因组区域独立出现了两次或两次以上。在参考菌株中,A组LSPs均未被IS6110序列侧翼包围,但有1个B组LSP和5个C组LSPs被IS6110序列侧翼包围。编码脯氨酸-谷氨酸或脯氨酸-脯氨酸-谷氨酸蛋白家族成员的基因仅存在于B组或C组LSPs中。还比较了基于SNP和基于LSP的系统发育。我们通过单独的基于LSP的系统发育分析将每个分离株分类为58种LSP类型,并将LSP类型映射到SNP树上。LSPs通常能将分离株正确地归类到系统发育谱系中;然而,58种LSP类型中有6种(10%)出现了明显错误。总之,大多数LSPs出现在易于发生重复插入/缺失事件的基因组区域,并且在临床结核分枝杆菌中导致了出乎意料的高度基因组变异。B组和C组LSPs可能代表由于选择压力而出现的多态性,并影响生物体的表型,而A组LSPs是更优的系统发育标记。