Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
Department of Biology, University of Bari, Aldo Moro, Bari 70125, Italy.
Science. 2022 Apr;376(6588):eabj6965. doi: 10.1126/science.abj6965. Epub 2022 Apr 1.
Despite their importance in disease and evolution, highly identical segmental duplications (SDs) are among the last regions of the human reference genome (GRCh38) to be fully sequenced. Using a complete telomere-to-telomere human genome (T2T-CHM13), we present a comprehensive view of human SD organization. SDs account for nearly one-third of the additional sequence, increasing the genome-wide estimate from 5.4 to 7.0% [218 million base pairs (Mbp)]. An analysis of 268 human genomes shows that 91% of the previously unresolved T2T-CHM13 SD sequence (68.3 Mbp) better represents human copy number variation. Comparing long-read assemblies from human ( = 12) and nonhuman primate ( = 5) genomes, we systematically reconstruct the evolution and structural haplotype diversity of biomedically relevant and duplicated genes. This analysis reveals patterns of structural heterozygosity and evolutionary differences in SD organization between humans and other primates.
尽管高度相同的片段重复(SD)在疾病和进化中很重要,但它们是人类参考基因组(GRCh38)中最后完全测序的区域之一。利用完整的端粒到端粒人类基因组(T2T-CHM13),我们全面展示了人类 SD 组织。SD 占额外序列的近三分之一,将全基因组的估计值从 5.4%增加到 7.0%[2.18 亿碱基对(Mbp)]。对 268 个人类基因组的分析表明,之前未解决的 T2T-CHM13 SD 序列(6830 万碱基对)中有 91%更好地代表了人类拷贝数变异。比较来自人类(= 12)和非人类灵长类动物(= 5)基因组的长读序列组装,我们系统地重建了与医学相关的和重复的基因的进化和结构单倍型多样性。这种分析揭示了人类和其他灵长类动物之间 SD 组织的结构杂合性和进化差异模式。