Pereira Vini
Department of Life Sciences, Imperial College London, Silwood Park campus, Ascot, Berkshire SL5 7PY, UK.
BMC Genomics. 2008 Dec 18;9:614. doi: 10.1186/1471-2164-9-614.
Dispersed repeats are a major component of eukaryotic genomes and drivers of genome evolution. Annotation of DNA sequences homologous to known repetitive elements has been mainly performed with the program REPEATMASKER. Sequences annotated by REPEATMASKER often correspond to fragments of repetitive elements resulting from the insertion of younger elements or other rearrangements. Although REPEATMASKER annotation is indispensable for studying genome biology, this annotation does not contain much information on the common origin of fossil fragments that share an insertion event, especially where clusters of nested insertions of repetitive elements have occurred.
Here I present REANNOTATE, a computational tool to process REPEATMASKER annotation for automated i) defragmentation of dispersed repetitive elements, ii) resolution of the temporal order of insertions in clusters of nested elements, and iii) estimating the age of the elements, if they have long terminal repeats. I have re-annotated the repetitive content of human chromosomes, providing evidence for a recent expansion of satellite repeats on the Y chromosome and, from the retroviral age distribution, for a higher rate of evolution on the Y relative to autosomes.
REANNOTATE is ready to process existing annotation for automated evolutionary analysis of all types of complex repeats in any genome. The tool is freely available under the GPL at http://www.bioinformatics.org/reannotate.
分散重复序列是真核生物基因组的主要组成部分,也是基因组进化的驱动力。与已知重复元件同源的DNA序列注释主要通过REPEATMASKER程序进行。REPEATMASKER注释的序列通常对应于由较新元件插入或其他重排产生的重复元件片段。虽然REPEATMASKER注释对于研究基因组生物学不可或缺,但这种注释对于共享插入事件的化石片段的共同起源包含的信息不多,特别是在发生重复元件嵌套插入簇的地方。
在此,我展示了REANNOTATE,这是一种计算工具,用于处理REPEATMASKER注释,以自动进行以下操作:i)分散重复元件的去碎片化;ii)解析嵌套元件簇中插入的时间顺序;iii)如果元件具有长末端重复序列,则估计其年龄。我重新注释了人类染色体的重复内容,为Y染色体上卫星重复序列的近期扩张提供了证据,并从逆转录病毒年龄分布来看,证明Y染色体相对于常染色体具有更高的进化速率。
REANNOTATE已准备好处理现有注释,以便对任何基因组中所有类型的复杂重复序列进行自动进化分析。该工具可在http://www.bioinformatics.org/reannotate上根据GPL免费获取。