Vitol Inna, Driscoll Jeffrey, Kreiswirth Barry, Kurepina Natalia, Bennett Kristin P
Computer Science Department, Rensselaer Polytechnic Institute, 110 8th St, Troy, NY 12180, USA.
Infect Genet Evol. 2006 Nov;6(6):491-504. doi: 10.1016/j.meegid.2006.03.003. Epub 2006 May 2.
We present a novel approach for analysis of Mycobacterium tuberculosis complex (MTC) strain genotyping data. Our work presents a first step in an ongoing project dedicated to the development of decision support tools for tuberculosis (TB) epidemiologists exploiting both genotyping and epidemiological data. We focus on spacer oligonucleotide typing (spoligotyping), a genotyping method based on analysis of a direct repeat (DR) locus. We use mixture models to identify strain families of MTC based on their spoligotyping patterns. Our algorithm, SPOTCLUST, incorporates biological information on spoligotype evolution, without attempting to derive the full phylogeny of MTC. We applied our algorithm to 535 different spoligotype patterns identified among 7166 MTC strains isolated between 1996 and 2004 from New York State TB patients. Two models were employed and validated: a 36-component model based on global spoligotype database SpolDB3, and a randomly initialized model (RIM) containing 48 components. Our analysis both confirmed previously expert-defined families of MTC strains and suggested certain new families. SPOTCLUST, which is available online, can be further improved by incorporating data obtained using additional strain genetic markers and epidemiological information. We demonstrate on New York City (NYC) patient data how the resulting models can potentially form the basis of TB control tools using genotyping.
我们提出了一种分析结核分枝杆菌复合群(MTC)菌株基因分型数据的新方法。我们的工作是一个正在进行的项目的第一步,该项目致力于为结核病(TB)流行病学家开发决策支持工具,利用基因分型和流行病学数据。我们专注于间隔寡核苷酸分型(spoligotyping),这是一种基于对直接重复(DR)位点分析的基因分型方法。我们使用混合模型根据MTC的spoligotyping模式识别菌株家族。我们的算法SPOTCLUST纳入了关于spoligotype进化的生物学信息,而没有试图推导MTC的完整系统发育。我们将我们的算法应用于1996年至2004年从纽约州结核病患者中分离出的7166株MTC菌株中鉴定出的535种不同的spoligotype模式。采用并验证了两种模型:基于全球spoligotype数据库SpolDB3的36组分模型和包含48个组分的随机初始化模型(RIM)。我们的分析既证实了先前专家定义的MTC菌株家族,也提出了某些新的家族。可在线获取的SPOTCLUST可以通过纳入使用其他菌株遗传标记和流行病学信息获得的数据来进一步改进。我们以纽约市(NYC)患者数据为例,展示了所得模型如何潜在地形成使用基因分型的结核病控制工具的基础。