BGI-Shenzhen, Shenzhen 518083, China.
Nat Biotechnol. 2010 Jan;28(1):57-63. doi: 10.1038/nbt.1596. Epub 2009 Dec 7.
Here we integrate the de novo assembly of an Asian and an African genome with the NCBI reference human genome, as a step toward constructing the human pan-genome. We identified approximately 5 Mb of novel sequences not present in the reference genome in each of these assemblies. Most novel sequences are individual or population specific, as revealed by their comparison to all available human DNA sequence and by PCR validation using the human genome diversity cell line panel. We found novel sequences present in patterns consistent with known human migration paths. Cross-species conservation analysis of predicted genes indicated that the novel sequences contain potentially functional coding regions. We estimate that a complete human pan-genome would contain approximately 19-40 Mb of novel sequence not present in the extant reference genome. The extensive amount of novel sequence contributing to the genetic variation of the pan-genome indicates the importance of using complete genome sequencing and de novo assembly.
在这里,我们将一个亚洲人和一个非洲人的基因组从头组装与 NCBI 参考人类基因组进行整合,作为构建人类泛基因组的一步。我们在这两个组装中都鉴定到了大约 5Mb 不在参考基因组中的新序列。通过与所有可用的人类 DNA 序列进行比较,以及使用人类基因组多样性细胞系面板进行 PCR 验证,大多数新序列是个体或群体特异性的。我们发现新序列存在的模式与已知的人类迁徙路径一致。预测基因的跨物种保守性分析表明,新序列包含潜在的功能编码区。我们估计,一个完整的人类泛基因组将包含大约 19-40Mb 不在现存参考基因组中的新序列。大量的新序列有助于泛基因组的遗传变异,这表明使用完整的基因组测序和从头组装的重要性。