O'Bleness Majesta, Searles Veronica B, Dickens C Michael, Astling David, Albracht Derek, Mak Angel C Y, Lai Yvonne Y Y, Lin Chin, Chu Catherine, Graves Tina, Kwok Pui-Yan, Wilson Richard K, Sikela James M
Department of Biochemistry and Molecular Genetics, Human Medical Genetics and Neuroscience Programs, University of Colorado School of Medicine, 12801 E, 17th Avenue, Aurora, CO 80045, USA.
BMC Genomics. 2014 May 20;15(1):387. doi: 10.1186/1471-2164-15-387.
Although the reference human genome sequence was declared finished in 2003, some regions of the genome remain incomplete due to their complex architecture. One such region, 1q21.1-q21.2, is of increasing interest due to its relevance to human disease and evolution. Elucidation of the exact variants behind these associations has been hampered by the repetitive nature of the region and its incomplete assembly. This region also contains 238 of the 270 human DUF1220 protein domains, which are implicated in human brain evolution and neurodevelopment. Additionally, examinations of this protein domain have been challenging due to the incomplete 1q21 build. To address these problems, a single-haplotype hydatidiform mole BAC library (CHORI-17) was used to produce the first complete sequence of the 1q21.1-q21.2 region.
We found and addressed several inaccuracies in the GRCh37sequence of the 1q21 region on large and small scales, including genomic rearrangements and inversions, and incorrect gene copy number estimates and assemblies. The DUF1220-encoding NBPF genes required the most corrections, with 3 genes removed, 2 genes reassigned to the 1p11.2 region, 8 genes requiring assembly corrections for DUF1220 domains (~91 DUF1220 domains were misassigned), and multiple instances of nucleotide changes that reassigned the domain to a different DUF1220 subtype. These corrections resulted in an overall increase in DUF1220 copy number, yielding a haploid total of 289 copies. Approximately 20 of these new DUF1220 copies were the result of a segmental duplication from 1q21.2 to 1p11.2 that included two NBPF genes. Interestingly, this duplication may have been the catalyst for the evolutionarily important human lineage-specific chromosome 1 pericentric inversion.
Through the hydatidiform mole genome sequencing effort, the 1q21.1-q21.2 region is complete and misassemblies involving inter- and intra-region duplications have been resolved. The availability of this single haploid sequence path will aid in the investigation of many genetic diseases linked to 1q21, including several associated with DUF1220 copy number variations. Finally, the corrected sequence identified a recent segmental duplication that added 20 additional DUF1220 copies to the human genome, and may have facilitated the chromosome 1 pericentric inversion that is among the most notable human-specific genomic landmarks.
尽管人类参考基因组序列于2003年宣告完成,但由于基因组结构复杂,部分区域仍不完整。其中一个区域,即1q21.1 - q21.2,因其与人类疾病及进化的相关性而备受关注。该区域的重复性及其组装不完整阻碍了对这些关联背后确切变异的阐明。此区域还包含270个人类DUF1220蛋白结构域中的238个,这些结构域与人类大脑进化和神经发育有关。此外,由于1q21构建不完整,对该蛋白结构域的研究颇具挑战性。为解决这些问题,使用了单倍型葡萄胎BAC文库(CHORI - 17)来生成1q21.1 - q21.2区域的首个完整序列。
我们在1q21区域的GRCh37序列中发现并纠正了大小尺度上的几个不准确之处,包括基因组重排和倒位,以及不正确的基因拷贝数估计和组装。编码DUF1220的NBPF基因需要最多修正,移除了3个基因,将2个基因重新定位到1p11.2区域,8个基因的DUF1220结构域需要组装修正(约91个DUF1220结构域被错误分配),并且有多个核苷酸变化实例将结构域重新分配到不同的DUF1220亚型。这些修正导致DUF1220拷贝数总体增加,单倍体总数达到289个拷贝。这些新增的DUF1220拷贝中约20个是由1q21.2到1p11.2的片段重复产生的,该重复包含两个NBPF基因。有趣的是,这种重复可能是具有进化重要性的人类谱系特异性1号染色体着丝粒倒位的催化剂。
通过葡萄胎基因组测序工作,1q21.1 - q21.2区域已完整,涉及区域间和区域内重复的错误组装已得到解决。这个单倍体序列路径的可用性将有助于研究许多与1q21相关的遗传疾病,包括一些与DUF1220拷贝数变异相关的疾病。最后,修正后的序列识别出一个近期的片段重复,该重复为人类基因组增加了20个额外的DUF1220拷贝,并且可能促进了1号染色体着丝粒倒位,这是最显著的人类特异性基因组标志之一。