Yang Zuyu, Guarracino Andrea, Biggs Patrick J, Black Michael A, Ismail Nuzla, Wold Jana Renee, Merriman Tony R, Prins Pjotr, Garrison Erik, de Ligt Joep
Institute of Environmental Science and Research, Porirua, New Zealand.
Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, United States.
Front Genet. 2023 Aug 10;14:1225248. doi: 10.3389/fgene.2023.1225248. eCollection 2023.
Whole genome sequencing has revolutionized infectious disease surveillance for tracking and monitoring the spread and evolution of pathogens. However, using a linear reference genome for genomic analyses may introduce biases, especially when studies are conducted on highly variable bacterial genomes of the same species. Pangenome graphs provide an efficient model for representing and analyzing multiple genomes and their variants as a graph structure that includes all types of variations. In this study, we present a practical bioinformatics pipeline that employs the PanGenome Graph Builder and the Variation Graph toolkit to build pangenomes from assembled genomes, align whole genome sequencing data and call variants against a graph reference. The pangenome graph enables the identification of structural variants, rearrangements, and small variants (e.g., single nucleotide polymorphisms and insertions/deletions) simultaneously. We demonstrate that using a pangenome graph, instead of a single linear reference genome, improves mapping rates and variant calling for both simulated and real datasets of the pathogen . Overall, pangenome graphs offer a promising approach for comparative genomics and comprehensive genetic variation analysis in infectious disease. Moreover, this innovative pipeline, leveraging pangenome graphs, can bridge variant analysis, genome assembly, population genetics, and evolutionary biology, expanding the reach of genomic understanding and applications.
全基因组测序彻底改变了传染病监测方式,用于追踪和监测病原体的传播与进化。然而,在基因组分析中使用线性参考基因组可能会引入偏差,尤其是在对同一物种高度可变的细菌基因组进行研究时。泛基因组图谱提供了一种有效的模型,可将多个基因组及其变体表示和分析为一种包含所有类型变异的图谱结构。在本研究中,我们展示了一种实用的生物信息学流程,该流程利用泛基因组图谱构建器和变异图谱工具包,从组装好的基因组构建泛基因组,比对全基因组测序数据,并根据图谱参考进行变异检测。泛基因组图谱能够同时识别结构变异、重排和小变异(例如单核苷酸多态性和插入/缺失)。我们证明,使用泛基因组图谱而非单个线性参考基因组,可提高病原体模拟数据集和真实数据集的映射率和变异检测率。总体而言,泛基因组图谱为传染病的比较基因组学和全面的遗传变异分析提供了一种很有前景的方法。此外,这种利用泛基因组图谱的创新流程能够架起变异分析、基因组组装、群体遗传学和进化生物学之间的桥梁,扩展基因组理解和应用的范围。