Kamaraj Venkatesh, Gupta Ayam, Raman Karthik, Narayanan Manikandan, Sinha Himanshu
Centre for Integrative Biology and Systems Medicine (IBSE), Wadhwani School of Data Science and AI, Indian Institute of Technology (IIT) Madras, Chennai 600036, India.
Wadhwani School of Data Science and AI, IIT Madras, Chennai 600036, India.
NAR Genom Bioinform. 2025 Sep 3;7(3):lqaf121. doi: 10.1093/nargab/lqaf121. eCollection 2025 Sep.
Genome graphs provide a powerful reference structure for representing genetic diversity. Their structure emphasizes the polymorphic regions in a collection of genomes, enabling network-based comparisons of population-level variation. However, current tools are limited in their ability to quantify and compare structural features across large genome graphs. We introduce GViNC, Genome graph Visualization, Navigation, and Comparison, a novel framework that enables partitioning genome graphs into interpretable subgraphs, mapping linear coordinates to graph nodes, and summarizing both local and global structural variation using new metrics for variability, hypervariability, and graph distances. We applied GViNC to multiple pan-genomic and population-specific genome graphs constructed with over 85M variants in 2504 individuals from the 1000 Genomes Project. We found that genomic complexity varied by ancestry and across chromosomes, with rare variants increasing variability by 10-fold and hypervariability by 50-fold. GViNC highlighted key regions of the human genome, such as Human Leukocyte Antigen and DEFB loci, and many previously unreported high-diversity regions, some with population-specific signatures in protein-coding and regulatory genes. By bridging sequence-level variation and graph-level topology, GViNC enables scalable, quantitative exploration of genome structure across populations. GViNC's versatility can aid researchers in extensively investigating the genetic diversity of different cohorts, populations, or species of interest.
基因组图谱为表示遗传多样性提供了一种强大的参考结构。它们的结构突出了基因组集合中的多态性区域,使得能够基于网络对群体水平的变异进行比较。然而,当前工具在量化和比较大型基因组图谱的结构特征方面能力有限。我们引入了GV iNC(基因组图谱可视化、导航与比较),这是一个新颖的框架,它能够将基因组图谱划分为可解释的子图,将线性坐标映射到图节点,并使用变异性、超变异性和图距离的新指标来总结局部和全局结构变异。我们将GV iNC应用于多个泛基因组和特定群体的基因组图谱,这些图谱是由来自千人基因组计划的2504个个体中的超过8500万个变异构建而成的。我们发现基因组复杂性因祖先和染色体而异,罕见变异使变异性增加了10倍,超变异性增加了50倍。GV iNC突出了人类基因组的关键区域,如人类白细胞抗原和DEFB基因座,以及许多以前未报告的高多样性区域,其中一些在蛋白质编码和调控基因中具有群体特异性特征。通过弥合序列水平的变异和图水平的拓扑结构,GV iNC能够对不同群体的基因组结构进行可扩展的定量探索。GV iNC的多功能性有助于研究人员广泛研究不同队列、群体或感兴趣物种的遗传多样性。