Logsdon Glennis A, Ebert Peter, Audano Peter A, Loftus Mark, Porubsky David, Ebler Jana, Yilmaz Feyza, Hallast Pille, Prodanov Timofey, Yoo DongAhn, Paisie Carolyn A, Harvey William T, Zhao Xuefang, Martino Gianni V, Henglin Mir, Munson Katherine M, Rabbani Keon, Chin Chen-Shan, Gu Bida, Ashraf Hufsah, Scholz Stephan, Austine-Orimoloye Olanrewaju, Balachandran Parithi, Bonder Marc Jan, Cheng Haoyu, Chong Zechen, Crabtree Jonathan, Gerstein Mark, Guethlein Lisbeth A, Hasenfeld Patrick, Hickey Glenn, Hoekzema Kendra, Hunt Sarah E, Jensen Matthew, Jiang Yunzhe, Koren Sergey, Kwon Youngjun, Li Chong, Li Heng, Li Jiaqi, Norman Paul J, Oshima Keisuke K, Paten Benedict, Phillippy Adam M, Pollock Nicholas R, Rausch Tobias, Rautiainen Mikko, Song Yuwei, Söylev Arda, Sulovari Arvis, Surapaneni Likhitha, Tsapalou Vasiliki, Zhou Weichen, Zhou Ying, Zhu Qihui, Zody Michael C, Mills Ryan E, Devine Scott E, Shi Xinghua, Talkowski Michael E, Chaisson Mark J P, Dilthey Alexander T, Konkel Miriam K, Korbel Jan O, Lee Charles, Beck Christine R, Eichler Evan E, Marschall Tobias
Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Nature. 2025 Jul 23. doi: 10.1038/s41586-025-09140-6.
Diverse sets of complete human genomes are required to construct a pangenome reference and to understand the extent of complex structural variation. Here we sequence 65 diverse human genomes and build 130 haplotype-resolved assemblies (median continuity of 130 Mb), closing 92% of all previous assembly gaps and reaching telomere-to-telomere status for 39% of the chromosomes. We highlight complete sequence continuity of complex loci, including the major histocompatibility complex (MHC), SMN1/SMN2, NBPF8 and AMY1/AMY2, and fully resolve 1,852 complex structural variants. In addition, we completely assemble and validate 1,246 human centromeres. We find up to 30-fold variation in α-satellite higher-order repeat array length and characterize the pattern of mobile element insertions into α-satellite higher-order repeat arrays. Although most centromeres predict a single site of kinetochore attachment, epigenetic analysis suggests the presence of two hypomethylated regions for 7% of centromeres. Combining our data with the draft pangenome reference significantly enhances genotyping accuracy from short-read data, enabling whole-genome inference to a median quality value of 45. Using this approach, 26,115 structural variants per individual are detected, substantially increasing the number of structural variants now amenable to downstream disease association studies.
构建泛基因组参考图谱并了解复杂结构变异的程度需要多样化的完整人类基因组集合。在此,我们对65个多样化的人类基因组进行了测序,并构建了130个单倍型解析的基因组组装序列(中位连续性为130 Mb),填补了此前所有组装缺口的92%,39%的染色体达到了端粒到端粒的完整状态。我们强调了复杂基因座的完整序列连续性,包括主要组织相容性复合体(MHC)、SMN1/SMN2、NBPF8和AMY1/AMY2,并完全解析了1852个复杂结构变异。此外,我们完全组装并验证了1246个人类着丝粒。我们发现α卫星高阶重复序列阵列长度存在高达30倍的变异,并对插入α卫星高阶重复序列阵列的移动元件插入模式进行了表征。尽管大多数着丝粒预测有一个单一的动粒附着位点,但表观遗传分析表明,7%的着丝粒存在两个低甲基化区域。将我们的数据与泛基因组参考草图相结合,显著提高了短读长数据的基因分型准确性,使全基因组推断的中位质量值达到45。使用这种方法,每个个体可检测到26115个结构变异,大大增加了目前适用于下游疾病关联研究的结构变异数量。