Suppr超能文献

一位男性汉族个体的完整、全面二倍体基因组。

The complete and fully-phased diploid genome of a male Han Chinese.

机构信息

Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China.

Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China.

出版信息

Cell Res. 2023 Oct;33(10):745-761. doi: 10.1038/s41422-023-00849-5. Epub 2023 Jul 14.

Abstract

Since the release of the complete human genome, the priority of human genomic study has now been shifting towards closing gaps in ethnic diversity. Here, we present a fully phased and well-annotated diploid human genome from a Han Chinese male individual (CN1), in which the assemblies of both haploids achieve the telomere-to-telomere (T2T) level. Comparison of this diploid genome with the CHM13 haploid T2T genome revealed significant variations in the centromere. Outside the centromere, we discovered 11,413 structural variations, including numerous novel ones. We also detected thousands of CN1 alleles that have accumulated high substitution rates and a few that have been under positive selection in the East Asian population. Further, we found that CN1 outperforms CHM13 as a reference genome in mapping and variant calling for the East Asian population owing to the distinct structural variants of the two references. Comparison of SNP calling for a large cohort of 8869 Chinese genomes using CN1 and CHM13 as reference respectively showed that the reference bias profoundly impacts rare SNP calling, with nearly 2 million rare SNPs miss-called with different reference genomes. Finally, applying the CN1 as a reference, we discovered 5.80 Mb and 4.21 Mb putative introgression sequences from Neanderthal and Denisovan, respectively, including many East Asian specific ones undetected using CHM13 as the reference. Our analyses reveal the advances of using CN1 as a reference for population genomic studies and paleo-genomic studies. This complete genome will serve as an alternative reference for future genomic studies on the East Asian population.

摘要

自人类基因组完整序列公布以来,人类基因组研究的重点已转向填补种族多样性的空白。在这里,我们呈现了一个来自汉族男性个体(CN1)的全相和充分注释的二倍体人类基因组,其中两个单体的组装都达到了端粒到端粒(T2T)水平。将这个二倍体基因组与 CHM13 单体 T2T 基因组进行比较,发现着丝粒区域存在显著差异。在着丝粒之外,我们发现了 11413 个结构变异,包括许多新的结构变异。我们还检测到了数千个在东亚人群中积累了高替代率的 CN1 等位基因和少数受到正选择的等位基因。此外,我们发现由于两个参考基因组之间存在独特的结构变异,CN1 作为东亚人群的参考基因组在映射和变异调用方面优于 CHM13。使用 CN1 和 CHM13 分别作为参考基因组对 8869 个中国基因组的 SNP 调用进行比较表明,参考基因组的偏差会对罕见 SNP 的调用产生深远影响,使用不同的参考基因组会导致近 200 万个罕见 SNP 被错误调用。最后,应用 CN1 作为参考,我们分别从尼安德特人和丹尼索瓦人发现了 5.80Mb 和 4.21Mb 的可能渗入序列,包括许多使用 CHM13 作为参考无法检测到的东亚特异性序列。我们的分析揭示了使用 CN1 作为参考进行群体基因组学和古基因组学研究的进展。这个完整的基因组将作为未来东亚人群基因组研究的替代参考。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9646/10542383/efe99f89fdf1/41422_2023_849_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验