Jeong Hyeonsoo, Dishuck Philip C, Yoo DongAhn, Harvey William T, Munson Katherine M, Lewis Alexandra P, Kordosky Jennifer, Garcia Gage H, Yilmaz Feyza, Hallast Pille, Lee Charles, Pastinen Tomi, Eichler Evan E
Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
Altos Labs, San Diego, CA, USA.
Nat Genet. 2025 Feb;57(2):390-401. doi: 10.1038/s41588-024-02051-8. Epub 2025 Jan 8.
Segmental duplications (SDs) contribute significantly to human disease, evolution and diversity but have been difficult to resolve at the sequence level. We present a population genetics survey of SDs by analyzing 170 human genome assemblies (from 85 samples representing 38 Africans and 47 non-Africans) in which the majority of autosomal SDs are fully resolved using long-read sequence assembly. Excluding the acrocentric short arms and sex chromosomes, we identify 173.2 Mb of duplicated sequence (47.4 Mb not present in the telomere-to-telomere reference) distinguishing fixed from structurally polymorphic events. We find that intrachromosomal SDs are among the most variable, with rare events mapping near their progenitor sequences. African genomes harbor significantly more intrachromosomal SDs and are more likely to have recently duplicated gene families with higher copy numbers than non-African samples. Comparison to a resource of 563 million full-length isoform sequencing reads identifies 201 novel, potentially protein-coding genes corresponding to these copy number polymorphic SDs.
片段重复(SDs)对人类疾病、进化和多样性有重大影响,但在序列水平上一直难以解析。我们通过分析170个人类基因组组装体(来自85个样本,代表38名非洲人和47名非非洲人)对SDs进行了群体遗传学调查,其中大多数常染色体SDs使用长读长序列组装得以完全解析。排除近端着丝粒短臂和性染色体,我们鉴定出173.2兆碱基的重复序列(端粒到端粒参考序列中不存在的47.4兆碱基),区分了固定事件和结构多态性事件。我们发现染色体内SDs是最具变异性的,罕见事件映射在其祖先序列附近。与非非洲样本相比,非洲基因组含有显著更多的染色体内SDs,并且更有可能拥有近期复制且拷贝数更高的基因家族。与5.63亿条全长异构体测序读数资源进行比较,鉴定出201个与这些拷贝数多态性SDs相对应的新的潜在蛋白质编码基因。