Bailey Jeffrey A, Church Deanna M, Ventura Mario, Rocchi Mariano, Eichler Evan E
Department of Genetics, Center for Computational Genomics, Case Western Reserve University School of Medicine and University Hospitals of Cleveland, Cleveland, Ohio 4410, USA.
Genome Res. 2004 May;14(5):789-801. doi: 10.1101/gr.2238404.
Limited comparative studies suggest that the human genome is particularly enriched for recent segmental duplications. The extent of segmental duplications in other mammalian genomes is unknown and confounded by methodological differences in genome assembly. Here, we present a detailed analysis of recent duplication content within the mouse genome using a whole-genome assembly comparison method and a novel assembly independent method, designed to take advantage of the reduced allelic variation of the C57BL/6J strain. We conservatively estimate that approximately 57% of all highly identical segmental duplications (>or=90%) were misassembled or collapsed within the working draft WGS assembly. The WGS approach often leaves duplications fragmented and unassigned to a chromosome when compared with the clone-ordered-based approach. Our preliminary analysis suggests that 1.7%-2.0% of the mouse genome is part of recent large segmental duplications (about half of what is observed for the human genome). We have constructed a mouse segmental duplication database to aid in the characterization of these regions and their integration into the final mouse genome assembly. This work suggests significant biological differences in the architecture of recent segmental duplications between human and mouse. In addition, our unique method provides the means for improving whole-genome shotgun sequence assembly of mouse and future mammalian genomes.
有限的比较研究表明,人类基因组中近期的片段重复特别丰富。其他哺乳动物基因组中片段重复的程度尚不清楚,并且因基因组组装方法的差异而变得复杂。在这里,我们使用全基因组组装比较方法和一种新颖的独立于组装的方法,对小鼠基因组中近期的重复内容进行了详细分析,该方法旨在利用C57BL/6J品系减少的等位基因变异。我们保守估计,在工作草图WGS组装中,所有高度相同的片段重复(≥90%)中约有57%被错误组装或合并。与基于克隆排序的方法相比,WGS方法通常会使重复片段化且未分配到染色体上。我们的初步分析表明,1.7%-2.0%的小鼠基因组是近期大片段重复的一部分(约为人类基因组中观察到的一半)。我们构建了一个小鼠片段重复数据库,以帮助对这些区域进行表征,并将其整合到最终的小鼠基因组组装中。这项工作表明人类和小鼠近期片段重复结构存在显著的生物学差异。此外,我们独特的方法为改进小鼠和未来哺乳动物基因组的全基因组鸟枪法序列组装提供了手段。