Groza Cristian, Ge Bing, Cheung Warren A, Pastinen Tomi, Bourque Guillaume
Université de Montréal, Montréal Heart Institute, Montréal, Québec H1T 1C8, Canada.
McGill University, McGill University and Genome Quebec Innovation Centre, Montréal, Québec H3A 2T8, Canada.
Genome Res. 2025 Apr 14;35(4):644-652. doi: 10.1101/gr.279240.124.
Structural variants (SVs) are omnipresent in human DNA, yet their genotype and methylation statuses are rarely characterized due to previous limitations in genome assembly and detection of modified nucleotides. Also, the extent to which SVs act as methylation quantitative trait loci (SV-mQTLs) is largely unknown. Here, we generated a pangenome graph summarizing SVs in 782 de novo assemblies obtained from Genomic Answers for Kids, capturing 14.6 million CpG dinucleotides that are absent from the CHM13v2 reference (SV-CpGs), thus expanding their number by 43.6%. Using 435 methylomes, we genotyped 4.06 million SV-CpGs, of which 3.93 million (96.8%) are methylated at least once. Nonrepeat sequences contribute 1.59 × 10 novel SV-CpGs, followed by centromeric satellites (6.57 × 10), simple repeats (5.40 × 10), elements (5.07 × 10), satellites (2.17 × 10), LINE-1s (1.83 × 10), and SVA (SINE-VNTR-) elements (1.50 × 10). Centromeric satellites, simple repeats, and SVAs are overrepresented in SV-CpGs versus reference CpGs. Similarly, methylation levels in SV-CpGs are more variable than in reference CpGs. To explore if SVs are potentially causal for functional variation, we measured SV-mQTLs. This revealed over 230,464 methylation bins where the methylation is associated with common SVs within 100 kbp. Finally, we identified 65,659 methylation bins (28.5%) where the leading QTL variant is an SV. In conclusion, we demonstrate that graph pangenomes provide full SV structures, the associated methylation variation, and reveal tens of thousands of SV-mQTLs, underscoring the importance of assembly based analyses of human traits.
结构变异(SVs)在人类DNA中普遍存在,但由于先前在基因组组装和修饰核苷酸检测方面的局限性,它们的基因型和甲基化状态很少被表征。此外,SVs作为甲基化数量性状位点(SV-mQTLs)的程度在很大程度上尚不清楚。在这里,我们生成了一个泛基因组图,总结了从儿童基因组答案中获得的782个从头组装中的SVs,捕获了1460万个CHM13v2参考中不存在的CpG二核苷酸(SV-CpGs),从而使其数量增加了43.6%。使用435个甲基化组,我们对406万个SV-CpGs进行了基因分型,其中393万个(96.8%)至少有一次甲基化。非重复序列贡献了1.59×10个新的SV-CpGs,其次是着丝粒卫星序列(6.57×10)、简单重复序列(5.40×10)、元件(5.07×10)、卫星序列(2.17×10)、LINE-1序列(1.83×10)和SVA(短散在核元件-可变数目串联重复-)元件(1.50×10)。与参考CpGs相比,着丝粒卫星序列、简单重复序列和SVA在SV-CpGs中过度富集。同样,SV-CpGs中的甲基化水平比参考CpGs中的更具变异性。为了探索SVs是否可能是功能变异的原因,我们测量了SV-mQTLs。这揭示了超过230464个甲基化区间,其中甲基化与100kbp内的常见SVs相关。最后,我们确定了65659个甲基化区间(28.5%),其中主要QTL变异是一个SV。总之,我们证明了图形泛基因组提供了完整的SV结构、相关的甲基化变异,并揭示了数万个SV-mQTLs,强调了基于组装的人类性状分析的重要性。