Suppr超能文献

准确的转座元件注释对于分析新的基因组组装至关重要。

Accurate Transposable Element Annotation Is Vital When Analyzing New Genome Assemblies.

机构信息

Department of Biological Sciences, Texas Tech University.

Department of Biological Sciences, Texas Tech University

出版信息

Genome Biol Evol. 2016 Jan 21;8(2):403-10. doi: 10.1093/gbe/evw009.

Abstract

Transposable elements (TEs) are mobile genetic elements with the ability to replicate themselves throughout the host genome. In some taxa TEs reach copy numbers in hundreds of thousands and can occupy more than half of the genome. The increasing number of reference genomes from nonmodel species has begun to outpace efforts to identify and annotate TE content and methods that are used vary significantly between projects. Here, we demonstrate variation that arises in TE annotations when less than optimal methods are used. We found that across a variety of taxa, the ability to accurately identify TEs based solely on homology decreased as the phylogenetic distance between the queried genome and a reference increased. Next we annotated repeats using homology alone, as is often the case in new genome analyses, and a combination of homology and de novo methods as well as an additional manual curation step. Reannotation using these methods identified a substantial number of new TE subfamilies in previously characterized genomes, recognized a higher proportion of the genome as repetitive, and decreased the average genetic distance within TE families, implying recent TE accumulation. Finally, these finding-increased recognition of younger TEs-were confirmed via an analysis of the postman butterfly (Heliconius melpomene). These observations imply that complete TE annotation relies on a combination of homology and de novo-based repeat identification, manual curation, and classification and that relying on simple, homology-based methods is insufficient to accurately describe the TE landscape of a newly sequenced genome.

摘要

转座元件 (TEs) 是具有在宿主基因组中自我复制能力的可移动遗传元件。在某些分类群中,TEs 的拷贝数达到数十万,并且可以占据基因组的一半以上。来自非模式物种的参考基因组数量的增加开始超过识别和注释 TE 内容的努力,并且项目之间使用的方法差异很大。在这里,我们展示了当使用不太理想的方法时,TE 注释中出现的差异。我们发现,在各种分类群中,仅基于同源性准确识别 TEs 的能力随着查询基因组与参考基因组之间的系统发育距离的增加而降低。接下来,我们仅使用同源性对重复序列进行注释,这在新的基因组分析中经常发生,以及同源性和从头方法的组合以及额外的手动注释步骤。使用这些方法重新注释在先前表征的基因组中鉴定出大量新的 TE 亚家族,识别出更高比例的基因组具有重复性,并降低了 TE 家族内的平均遗传距离,这意味着最近 TE 的积累。最后,通过对后雄蝶(Heliconius melpomene)的分析证实了这些发现——即对年轻 TEs 的识别增加。这些观察结果意味着完整的 TE 注释依赖于同源性和基于从头的重复识别、手动注释、分类的组合,并且仅依赖于简单的基于同源性的方法不足以准确描述新测序基因组的 TE 景观。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验