在测序基因组中识别重复序列和转座元件：如何在密集的程序森林中找到自己的路。

Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs.

机构信息

Université de Lyon, F-6900 Lyon.

出版信息

Heredity (Edinb). 2010 Jun;104(6):520-33. doi: 10.1038/hdy.2009.165. Epub 2009 Nov 25.

DOI:10.1038/hdy.2009.165

PMID:19935826

Abstract

The production of genome sequences has led to another important advance in their annotation, which is closely linked to the exact determination of their content in terms of repeats, among which are transposable elements (TEs). The evolutionary implications and the presence of coding regions in some TEs can confuse gene annotation, and also hinder the process of genome assembly, making particularly crucial to be able to annotate and classify them correctly in genome sequences. This review is intended to provide an overview as comprehensive as possible of the automated methods currently used to annotate and classify TEs in sequenced genomes. Different categories of programs exist according to their methodology and the repeat, which they can identify. I describe here the main characteristics of the programs, their main goals and the difficulties they can entail. The drawbacks of the different methods are also highlighted to help biologists who are unfamiliar with algorithmic methods to understand this methodology better. Globally, using several different programs and carrying out a cross comparison of their results has the best chance of finding reliable results as any single program. However, this makes it essential to verify the results provided by each program independently. The ideal solution would be to test all programs against the same data set to obtain a true comparison of their actual performance.

摘要

基因组序列的产生使得对其进行注释的另一个重要进展成为可能，这与精确确定其在重复序列（其中包括转座元件 (TEs)）方面的含量密切相关。一些 TEs 中的编码区的进化意义和存在可能会混淆基因注释，也会阻碍基因组组装过程，因此能够正确注释和分类基因组序列中的 TEs 变得尤为关键。本文旨在全面概述目前用于注释和分类测序基因组中 TEs 的自动化方法。根据其方法和可识别的重复序列，存在不同类别的程序。我在这里描述了这些程序的主要特征、它们的主要目标以及它们可能带来的困难。还强调了不同方法的缺点，以帮助不熟悉算法方法的生物学家更好地理解该方法。总的来说，使用几个不同的程序并对它们的结果进行交叉比较，是找到可靠结果的最佳机会，因为任何单个程序都可能存在缺陷。然而，这使得独立验证每个程序提供的结果变得至关重要。理想的解决方案是使用相同的数据集测试所有程序，以获得它们实际性能的真实比较。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

在测序基因组中识别重复序列和转座元件：如何在密集的程序森林中找到自己的路。

Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs.

机构信息

出版信息

相似文献

引用本文的文献

在测序基因组中识别重复序列和转座元件：如何在密集的程序森林中找到自己的路。

Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs.

机构信息

出版信息

相似文献

引用本文的文献