Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China.
Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
Genome Biol. 2023 Jul 4;24(1):157. doi: 10.1186/s13059-023-02995-w.
The first telomere-to-telomere (T2T) human genome assembly (T2T-CHM13) release is a milestone in human genomics. The T2T-CHM13 genome assembly extends our understanding of telomeres, centromeres, segmental duplication, and other complex regions. The current human genome reference (GRCh38) has been widely used in various human genomic studies. However, the large-scale genomic differences between these two important genome assemblies are not characterized in detail yet.
Here, in addition to the previously reported "non-syntenic" regions, we find 67 additional large-scale discrepant regions and precisely categorize them into four structural types with a newly developed website tool called SynPlotter. The discrepant regions (~ 21.6 Mbp) excluding telomeric and centromeric regions are highly structurally polymorphic in humans, where the deletions or duplications are likely associated with various human diseases, such as immune and neurodevelopmental disorders. The analyses of a newly identified discrepant region-the KLRC gene cluster-show that the depletion of KLRC2 by a single-deletion event is associated with natural killer cell differentiation in ~ 20% of humans. Meanwhile, the rapid amino acid replacements observed within KLRC3 are probably a result of natural selection in primate evolution.
Our study provides a foundation for understanding the large-scale structural genomic differences between the two crucial human reference genomes, and is thereby important for future human genomics studies.
首个端粒到端粒(T2T)人类基因组组装(T2T-CHM13)的发布是人类基因组学的一个里程碑。T2T-CHM13 基因组组装扩展了我们对端粒、着丝粒、片段重复和其他复杂区域的理解。当前的人类基因组参考(GRCh38)已广泛应用于各种人类基因组研究。然而,这两个重要基因组组装之间的大规模基因组差异尚未详细描述。
除了之前报道的“非同源”区域外,我们还发现了 67 个额外的大规模差异区域,并使用新开发的名为 SynPlotter 的网站工具精确地将它们归类为四种结构类型。排除端粒和着丝粒区域的差异区域(~21.6 Mbp)在人类中高度结构多态性,其中缺失或重复可能与各种人类疾病相关,如免疫和神经发育障碍。对新鉴定的差异区域 KLRC 基因簇的分析表明,单个缺失事件导致 KLRC2 的缺失与约 20%人类的自然杀伤细胞分化有关。同时,KLRC3 内观察到的快速氨基酸替换可能是灵长类动物进化过程中自然选择的结果。
我们的研究为理解这两个关键人类参考基因组之间的大规模结构基因组差异提供了基础,因此对未来的人类基因组学研究很重要。