Wang Jun, Wang Wei, Li Ruiqiang, Li Yingrui, Tian Geng, Goodman Laurie, Fan Wei, Zhang Junqing, Li Jun, Zhang Juanbin, Guo Yiran, Feng Binxiao, Li Heng, Lu Yao, Fang Xiaodong, Liang Huiqing, Du Zhenglin, Li Dong, Zhao Yiqing, Hu Yujie, Yang Zhenzhen, Zheng Hancheng, Hellmann Ines, Inouye Michael, Pool John, Yi Xin, Zhao Jing, Duan Jinjie, Zhou Yan, Qin Junjie, Ma Lijia, Li Guoqing, Yang Zhentao, Zhang Guojie, Yang Bin, Yu Chang, Liang Fang, Li Wenjie, Li Shaochuan, Li Dawei, Ni Peixiang, Ruan Jue, Li Qibin, Zhu Hongmei, Liu Dongyuan, Lu Zhike, Li Ning, Guo Guangwu, Zhang Jianguo, Ye Jia, Fang Lin, Hao Qin, Chen Quan, Liang Yu, Su Yeyang, San A, Ping Cuo, Yang Shuang, Chen Fang, Li Li, Zhou Ke, Zheng Hongkun, Ren Yuanyuan, Yang Ling, Gao Yang, Yang Guohua, Li Zhuo, Feng Xiaoli, Kristiansen Karsten, Wong Gane Ka-Shu, Nielsen Rasmus, Durbin Richard, Bolund Lars, Zhang Xiuqing, Li Songgang, Yang Huanming, Wang Jian
Beijing Genomics Institute at Shenzhen, Shenzhen 518000, China.
Nature. 2008 Nov 6;456(7218):60-5. doi: 10.1038/nature07484.
Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics.
在此,我们展示了首个亚洲个体的二倍体基因组序列。该基因组采用大规模平行测序技术进行测序,平均覆盖度达36倍。我们将短读长序列与NCBI人类参考基因组比对,覆盖度达99.97%,并在参考基因组的指导下,使用唯一比对上的读长序列为该亚洲个体92%的基因组组装出高质量的一致序列。我们在该区域内鉴定出约300万个单核苷酸多态性(SNP),其中13.6%不在dbSNP数据库中。基因分型分析表明,SNP鉴定具有很高的准确性和一致性,表明该组装序列质量很高。我们还针对HapMap CHB和JPT单倍型(分别为中国人和日本人的单倍型)进行了杂合子定相和单倍型预测,与两个已有的个体基因组(J.D.沃森和J.C.文特尔)进行了序列比较,并鉴定了结构变异。我们考虑了这些变异可能产生的生物学影响。我们的序列数据和分析证明了下一代测序技术在个人基因组学方面的潜在用途。