Teo Audrey S M, Verzotto Davide, Yao Fei, Nagarajan Niranjan, Hillmer Axel M
Cancer Therapeutics and Stratified Oncology, Genome Institute of Singapore, 60 Biopolis Street, Singapore, 138672 Singapore.
Computational and Systems Biology, Genome Institute of Singapore, 60 Biopolis Street, Singapore, 138672 Singapore.
Gigascience. 2015 Dec 29;4:65. doi: 10.1186/s13742-015-0106-1. eCollection 2015.
Next-generation sequencing (NGS) technologies have changed our understanding of the variability of the human genome. However, the identification of genome structural variations based on NGS approaches with read lengths of 35-300 bases remains a challenge. Single-molecule optical mapping technologies allow the analysis of DNA molecules of up to 2 Mb and as such are suitable for the identification of large-scale genome structural variations, and for de novo genome assemblies when combined with short-read NGS data. Here we present optical mapping data for two human genomes: the HapMap cell line GM12878 and the colorectal cancer cell line HCT116.
High molecular weight DNA was obtained by embedding GM12878 and HCT116 cells, respectively, in agarose plugs, followed by DNA extraction under mild conditions. Genomic DNA was digested with KpnI and 310,000 and 296,000 DNA molecules (≥ 150 kb and 10 restriction fragments), respectively, were analyzed per cell line using the Argus optical mapping system. Maps were aligned to the human reference by OPTIMA, a new glocal alignment method. Genome coverage of 6.8× and 5.7× was obtained, respectively; 2.9× and 1.7× more than the coverage obtained with previously available software.
Optical mapping allows the resolution of large-scale structural variations of the genome, and the scaffold extension of NGS-based de novo assemblies. OPTIMA is an efficient new alignment method; our optical mapping data provide a resource for genome structure analyses of the human HapMap reference cell line GM12878, and the colorectal cancer cell line HCT116.
新一代测序(NGS)技术改变了我们对人类基因组变异性的理解。然而,基于读长为35 - 300个碱基的NGS方法来识别基因组结构变异仍然是一项挑战。单分子光学图谱技术能够分析长达2 Mb的DNA分子,因此适用于识别大规模基因组结构变异,以及在与短读长NGS数据结合时用于从头基因组组装。在此,我们展示了两个人类基因组的光学图谱数据:国际人类基因组单体型图(HapMap)细胞系GM12878和结肠癌细胞系HCT116。
分别将GM12878和HCT116细胞包埋于琼脂糖凝胶块中,然后在温和条件下进行DNA提取,从而获得高分子量DNA。用KpnI对基因组DNA进行消化,每个细胞系使用阿格斯(Argus)光学图谱系统分别分析了310,000和296,000个DNA分子(≥150 kb且有10个限制性酶切片段)。图谱通过一种新的全局局部比对方法OPTIMA与人类参考基因组进行比对。分别获得了6.8倍和5.7倍的基因组覆盖率;比使用先前可用软件获得的覆盖率分别多2.9倍和1.7倍。
光学图谱能够解析基因组的大规模结构变异,以及基于NGS的从头组装的支架延伸。OPTIMA是一种高效的新比对方法;我们的光学图谱数据为人类HapMap参考细胞系GM12878和结肠癌细胞系HCT116的基因组结构分析提供了资源。