Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Nature. 2024 May;629(8010):136-145. doi: 10.1038/s41586-024-07278-3. Epub 2024 Apr 3.
Human centromeres have been traditionally very difficult to sequence and assemble owing to their repetitive nature and large size. As a result, patterns of human centromeric variation and models for their evolution and function remain incomplete, despite centromeres being among the most rapidly mutating regions. Here, using long-read sequencing, we completely sequenced and assembled all centromeres from a second human genome and compared it to the finished reference genome. We find that the two sets of centromeres show at least a 4.1-fold increase in single-nucleotide variation when compared with their unique flanks and vary up to 3-fold in size. Moreover, we find that 45.8% of centromeric sequence cannot be reliably aligned using standard methods owing to the emergence of new α-satellite higher-order repeats (HORs). DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by >500 kb. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan and macaque genomes. Comparative analyses reveal a nearly complete turnover of α-satellite HORs, with characteristic idiosyncratic changes in α-satellite HORs for each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the short (p) and long (q) arms across centromeres and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.
人类着丝粒由于其重复性质和较大的尺寸,传统上测序和组装非常困难。结果,尽管着丝粒是突变速度最快的区域之一,但人类着丝粒变异的模式和它们进化和功能的模型仍然不完整。在这里,我们使用长读测序技术,完全测序并组装了第二个人类基因组中的所有着丝粒,并将其与已完成的参考基因组进行了比较。我们发现,与它们独特的侧翼相比,两组着丝粒的单核苷酸变异至少增加了 4.1 倍,大小变化高达 3 倍。此外,我们发现由于新的α卫星高级重复(HOR)的出现,45.8%的着丝粒序列无法使用标准方法进行可靠比对。DNA 甲基化和 CENP-A 染色质免疫沉淀实验表明,26%的着丝粒的动粒位置差异超过 500 kb。为了了解进化变化,我们选择了六个染色体,并对普通黑猩猩、猩猩和猕猴基因组中的 31 个同源着丝粒进行了测序和组装。比较分析显示,α卫星 HOR 几乎完全发生了更替,每个物种的α卫星 HOR 都有特征性的独特变化。人类单倍型的系统发育重建支持着丝粒短臂(p)和长臂(q)之间的重组有限,揭示了新的α卫星 HOR 具有单系起源,这为估计人类着丝粒 DNA 的跳跃式扩增和突变提供了一种策略。