She Xinwei, Jiang Zhaoshi, Clark Royden A, Liu Ge, Cheng Ze, Tuzun Eray, Church Deanna M, Sutton Granger, Halpern Aaron L, Eichler Evan E
Department of Genome Sciences, University of Washington School of Medicine, 1705 NE Pacific Street, Seattle, Washington 98195, USA.
Nature. 2004 Oct 21;431(7011):927-30. doi: 10.1038/nature03062.
Complex eukaryotic genomes are now being sequenced at an accelerated pace primarily using whole-genome shotgun (WGS) sequence assembly approaches. WGS assembly was initially criticized because of its perceived inability to resolve repeat structures within genomes. Here, we quantify the effect of WGS sequence assembly on large, highly similar repeats by comparison of the segmental duplication content of two different human genome assemblies. Our analysis shows that large (> 15 kilobases) and highly identical (> 97%) duplications are not adequately resolved by WGS assembly. This leads to significant reduction in genome length and the loss of genes embedded within duplications. Comparable analyses of mouse genome assemblies confirm that strict WGS sequence assembly will oversimplify our understanding of mammalian genome structure and evolution; a hybrid strategy using a targeted clone-by-clone approach to resolve duplications is proposed.
复杂的真核生物基因组目前正以加快的速度进行测序,主要采用全基因组鸟枪法(WGS)序列组装方法。WGS组装最初受到批评,因为人们认为它无法解析基因组中的重复结构。在这里,我们通过比较两种不同人类基因组组装的片段重复内容,来量化WGS序列组装对大型、高度相似重复序列的影响。我们的分析表明,WGS组装无法充分解析大型(>15千碱基)且高度相同(>97%)的重复序列。这导致基因组长度显著缩短以及重复序列中嵌入基因的丢失。对小鼠基因组组装的类似分析证实,严格的WGS序列组装将过度简化我们对哺乳动物基因组结构和进化的理解;因此提出了一种使用靶向逐个克隆方法来解析重复序列的混合策略。