Institute for Molecular Bacteriology, TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Hannover Medical School (MHH) and the Helmholtz Centre for Infection Research (HZI), Hannover, Germany.
Cluster of Excellence RESIST (EXC 2155), Hannover Medical School (MHH), Hannover, Germany.
Microb Genom. 2023 Nov;9(11). doi: 10.1099/mgen.0.001129.
The wide adoption of bacterial genome sequencing and encoding both core and accessory genome variation using -mers has allowed bacterial genome-wide association studies (GWAS) to identify genetic variants associated with relevant phenotypes such as those linked to infection. Significant limitations still remain because of -mers being duplicated across gene clusters and as far as the interpretation of association results is concerned, which affects the wider adoption of GWAS methods on microbial data sets. We have developed a simple computational method (panfeed) that explicitly links each -mer to their gene cluster at base-resolution level, which allows us to avoid biases introduced by a global de Bruijn graph as well as more easily map and annotate associated variants. We tested panfeed on two independent data sets, correctly identifying previously characterized causal variants, which demonstrates the precision of the method, as well as its scalable performance. panfeed is a command line tool written in the python programming language and is available at https://github.com/microbial-pangenomes-lab/panfeed.
广泛采用细菌基因组测序,并使用 -mers 对核心和辅助基因组变异进行编码,使得细菌全基因组关联研究(GWAS)能够识别与相关表型相关的遗传变异,例如与感染相关的表型。由于 -mers 在基因簇之间重复,以及关联结果的解释仍然存在重大限制,这影响了 GWAS 方法在微生物数据集上的更广泛采用。我们开发了一种简单的计算方法(panfeed),可以在碱基分辨率水平上明确将每个 -mer 与其基因簇联系起来,这使我们能够避免全局 de Bruijn 图引入的偏差,并更轻松地映射和注释相关变体。我们在两个独立的数据集中测试了 panfeed,正确识别了先前表征的因果变异,这证明了该方法的精确性,以及其可扩展的性能。panfeed 是一个用 Python 编程语言编写的命令行工具,可在 https://github.com/microbial-pangenomes-lab/panfeed 上获得。