Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.
Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA.
Genetics. 2022 Feb 4;220(2). doi: 10.1093/genetics/iyab227.
Until 2019, the human genome was available in only one fully annotated version, GRCh38, which was the result of 18 years of continuous improvement and revision. Despite dramatic improvements in sequencing technology, no other genome was available as an annotated reference until 2019, when the genome of an Ashkenazi individual, Ash1, was released. In this study, we describe the assembly and annotation of a second individual genome, from a Puerto Rican individual whose DNA was collected as part of the Human Pangenome project. The new genome, called PR1, is the first true reference genome created from an individual of African descent. Due to recent improvements in both sequencing and assembly technology, and particularly to the use of the recently completed CHM13 human genome as a guide to assembly, PR1 is more complete and more contiguous than either GRCh38 or Ash1. Annotation revealed 37,755 genes (of which 19,999 are protein coding), including 12 additional gene copies that are present in PR1 and missing from CHM13. Fifty-seven genes have fewer copies in PR1 than in CHM13, 9 map only partially, and 3 genes (all noncoding) from CHM13 are entirely missing from PR1.
截至 2019 年,人类基因组仅有一个完全注释的版本 GRCh38,这是 18 年不断改进和修订的结果。尽管测序技术有了显著的提高,但直到 2019 年,另一个基因组才作为注释参考发布,即 Ashkenazi 个体 Ash1 的基因组。在这项研究中,我们描述了第二个个体基因组的组装和注释,该基因组来自一名波多黎各人,其 DNA 是作为人类泛基因组计划的一部分收集的。新的基因组称为 PR1,是第一个真正来自非洲裔个体的参考基因组。由于测序和组装技术的最新改进,特别是使用最近完成的 CHM13 人类基因组作为组装的指南,PR1 比 GRCh38 或 Ash1 更完整和更连续。注释揭示了 37755 个基因(其中 19999 个是蛋白质编码基因),包括 PR1 中存在而 CHM13 中缺失的 12 个额外基因副本。PR1 中 57 个基因的副本少于 CHM13,9 个基因仅部分映射,而 3 个来自 CHM13 的基因(均为非编码)完全从 PR1 中缺失。