Jiang Zhaoshi, Hubley Robert, Smit Arian, Eichler Evan E
Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA.
Genome Res. 2008 Aug;18(8):1362-8. doi: 10.1101/gr.078477.108. Epub 2008 May 23.
Segmental duplications (SDs) play an important role in genome rearrangement, evolution, and the copy-number variation (CNV) of primate genomes. Such sequences are difficult to detect, a priori, because they share no defining sequence features that distinguish them from unique portions of the genome. Current sequence annotation of segmental duplications requires computationally intensive, genome-wide self-comparisons that cannot be easily implemented on new data sets. Based on the successful implementation of RepeatMasker, we developed a new genome annotation tool, DupMasker. The program uses a library of nonredundant consensus sequences of human segmental duplications, wherein a majority of the ancestral origins have been determined based on comparisons to mammalian outgroup genomes. Using DupMasker, new human and nonhuman primate (NHP) sequences may be readily queried to provide details on the origin and degree of sequence identity of each duplicon. This program can be applied to delineate the order and orientation of duplicons within complex duplication blocks and used to characterize structural variation differences between sequenced human haplotypes. We predict this tool will be valuable in the annotation of large-insert sequence clones, allowing putative unique and duplicated regions of the genomes to be annotated prior to whole genome assembly comparisons.
片段重复(SDs)在灵长类基因组的重排、进化以及拷贝数变异(CNV)中发挥着重要作用。这类序列很难先验地检测到,因为它们没有可区分其与基因组独特部分的明确序列特征。目前片段重复的序列注释需要进行全基因组范围的计算密集型自我比对,而这在新数据集上不易实现。基于RepeatMasker的成功应用,我们开发了一种新的基因组注释工具DupMasker。该程序使用人类片段重复的非冗余共有序列库,其中大部分祖先起源是通过与哺乳动物外群基因组比较确定的。使用DupMasker,可以轻松查询新的人类和非人类灵长类(NHP)序列,以提供每个重复子的起源和序列同一性程度的详细信息。该程序可用于描绘复杂重复块内重复子的顺序和方向,并用于表征测序人类单倍型之间的结构变异差异。我们预计该工具在大插入序列克隆的注释中将很有价值,能够在全基因组组装比较之前对基因组的假定独特区域和重复区域进行注释。