Bovee Donald, Zhou Yang, Haugen Eric, Wu Zaining, Hayden Hillary S, Gillett Will, Tuzun Eray, Cooper Gregory M, Sampas Nick, Phelps Karen, Levy Ruth, Morrison V Anne, Sprague James, Jewett Donald, Buckley Danielle, Subramaniam Sandhya, Chang Jean, Smith Douglas R, Olson Maynard V, Eichler Evan E, Kaul Rajinder
Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington 98195, USA.
Nat Genet. 2008 Jan;40(1):96-101. doi: 10.1038/ng.2007.34. Epub 2007 Dec 23.
The human genome sequence has been finished to very high standards; however, more than 340 gaps remained when the finished genome was published by the International Human Genome Sequencing Consortium in 2004. Using fosmid resources generated from multiple individuals, we targeted gaps in the euchromatic part of the human genome. Here we report 2,488,842 bp of previously unknown euchromatic sequence, 363,114 bp of which close 26 of 250 euchromatic gaps, or 10%, including two remaining euchromatic gaps on chromosome 19. Eight (30.7%) of the closed gaps were found to be polymorphic. These sequences allow complete annotation of several human genes as well as the assignment of mRNAs. The gap sequences are 2.3-fold enriched in segmentally duplicated sequences compared to the whole genome. Our analysis confirms that not all gaps within 'finished' genomes are recalcitrant to subcloning and suggests that the paired-end-sequenced fosmid libraries could prove to be a rich resource for completion of the human euchromatic genome.
人类基因组序列已按照非常高的标准完成;然而,2004年国际人类基因组测序联盟发布完成的基因组时,仍有340多个缺口。我们利用从多个个体产生的fosmid资源,针对人类基因组常染色质部分的缺口展开研究。在此,我们报告了2488842 bp以前未知的常染色质序列,其中363114 bp填补了250个常染色质缺口中的26个,即10%,包括19号染色体上剩余的两个常染色质缺口。发现已填补缺口中的8个(30.7%)具有多态性。这些序列使得多个人类基因得以完整注释以及mRNA得以定位。与整个基因组相比,缺口序列在片段重复序列中富集了2.3倍。我们的分析证实,“完成的”基因组中的并非所有缺口都难以进行亚克隆,并且表明双末端测序的fosmid文库可能是完成人类常染色质基因组的丰富资源。