Secomandi Simona, Gallo Guido Roberto, Rossi Riccardo, Rodríguez Fernandes Carlos, Jarvis Erich D, Bonisoli-Alquati Andrea, Gianfranceschi Luca, Formenti Giulio
Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY, USA.
Department of Biosciences, University of Milan, Milan, Italy.
Nat Genet. 2025 Jan;57(1):13-26. doi: 10.1038/s41588-024-02029-6. Epub 2025 Jan 8.
Complete datasets of genetic variants are key to biodiversity genomic studies. Long-read sequencing technologies allow the routine assembly of highly contiguous, haplotype-resolved reference genomes. However, even when complete, reference genomes from a single individual may bias downstream analyses and fail to adequately represent genetic diversity within a population or species. Pangenome graphs assembled from aligned collections of high-quality genomes can overcome representation bias by integrating sequence information from multiple genomes from the same population, species or genus into a single reference. Here, we review the available tools and data structures to build, visualize and manipulate pangenome graphs while providing practical examples and discussing their applications in biodiversity and conservation genomics across the tree of life.
完整的遗传变异数据集是生物多样性基因组研究的关键。长读长测序技术允许常规组装高度连续的、单倍型解析的参考基因组。然而,即使是完整的,来自单个个体的参考基因组也可能使下游分析产生偏差,并且无法充分代表一个种群或物种内的遗传多样性。从高质量基因组的比对集合中组装的泛基因组图谱可以通过将来自同一群体、物种或属的多个基因组的序列信息整合到单个参考中,来克服代表性偏差。在这里,我们回顾了用于构建、可视化和操作泛基因组图谱的可用工具和数据结构,同时提供实际示例并讨论它们在生命之树的生物多样性和保护基因组学中的应用。