Buisine Nicolas, Quesneville Hadi, Colot Vincent
Unité de Recherche en Génomique Végétale, INRA UMR1165-CNRS UMR8114-Université d'Evry Val d'Essonne, 2 rue Gaston Crémieux, 91057 Evry, France.
Genomics. 2008 May;91(5):467-75. doi: 10.1016/j.ygeno.2008.01.005. Epub 2008 Mar 14.
Transposable elements (TEs) are ubiquitous components of eukaryotic genomes that impact many aspects of genome function. TE detection in genomic sequences is typically performed using similarity searches against a set of reference sequences built from previously identified TEs. Here, we demonstrate that this process can be improved by designing reference sets that incorporate key aspects of the structure and evolution of TEs and by combining these sets with Repbase Update (RU), which is composed mainly of consensus sequences. Using the Arabidopsis genome as a test case, our approach leads to the detection of an extra 12.4% of TE sequences. These correspond to novel TE fragments as well as to the extension of TE fragments already detected by RU. Significantly, we find that TE detection could be readily optimized using only two reference sets, one containing true consensus sequences and the other mosaic sequences that capture the structural diversity of TE copies within a family.
转座元件(TEs)是真核生物基因组中普遍存在的组成部分,会影响基因组功能的许多方面。基因组序列中的TE检测通常是通过与一组基于先前鉴定的TE构建的参考序列进行相似性搜索来进行的。在此,我们证明,通过设计纳入TE结构和进化关键方面的参考集,并将这些集与主要由共有序列组成的Repbase Update(RU)相结合,这一过程可以得到改进。以拟南芥基因组作为测试案例,我们的方法使得检测到的TE序列额外增加了12.4%。这些对应于新的TE片段以及RU已经检测到的TE片段的延伸。值得注意的是,我们发现仅使用两个参考集就能轻松优化TE检测,一个包含真实共有序列,另一个包含捕获家族内TE拷贝结构多样性的嵌合序列。