Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
Developmental Therapeutics Branch, National Cancer Institute, Bethesda, MD, USA.
Nature. 2021 May;593(7857):101-107. doi: 10.1038/s41586-021-03420-7. Epub 2021 Apr 7.
The complete assembly of each human chromosome is essential for understanding human biology and evolution. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the β-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.
人类每条染色体的完整组装对于理解人类生物学和进化至关重要。在这里,我们使用互补的长读测序技术来完成人类 8 号染色体的线性组装。我们的组装解决了五个以前长期存在的缺口的序列问题,包括一个 2.08Mb 的着丝粒α-卫星阵列、一个位于β-防御素基因簇中的 644kb 拷贝数多态性,该多态性对疾病风险很重要,以及一个位于 8q21.2 的 863kb 可变数串联重复序列,它可以作为新着丝粒。我们表明,除了富含 CENP-A 核小体的多样化高阶α-卫星的 73kb 低甲基化区域外,着丝粒α-卫星阵列通常是甲基化的,这与动粒的位置一致。此外,我们在二倍体人类基因组中确认了着丝粒的整体组织和甲基化模式。使用双长读测序方法,我们完成了来自黑猩猩、猩猩和猕猴的 8 号染色体同源着丝粒的高质量草图组装,以重建其进化历史。比较和系统发育分析表明,高阶α-卫星结构在具有分层对称性的大猿祖先中进化,其中更古老的高阶重复位于单体α-卫星的外围。我们估计,与基因组的独特部分相比,着丝粒卫星 DNA 的突变率加速了 2.2 倍以上,这种加速延伸到了侧翼序列。