Suppr超能文献

直翅目转座元件文库(Orthoptera-TElib):用于转座元件注释的直翅目转座元件文库。

Orthoptera-TElib: a library of Orthoptera transposable elements for TE annotation.

作者信息

Liu Xuanzeng, Zhao Lina, Majid Muhammad, Huang Yuan

机构信息

College of Life Sciences, Shaanxi Normal University, Xi'an, China.

出版信息

Mob DNA. 2024 Mar 15;15(1):5. doi: 10.1186/s13100-024-00316-x.

Abstract

Transposable elements (TEs) are a major component of eukaryotic genomes and are present in almost all eukaryotic organisms. TEs are highly dynamic between and within species, which significantly affects the general applicability of the TE databases. Orthoptera is the only known group in the class Insecta with a significantly enlarged genome (0.93-21.48 Gb). When analyzing the large genome using the existing TE public database, the efficiency of TE annotation is not satisfactory. To address this limitation, it becomes imperative to continually update the available TE resource library and the need for an Orthoptera-specific library as more insect genomes are publicly available. Here, we used the complete genome data of 12 Orthoptera species to de novo annotate TEs, then manually re-annotate the unclassified TEs to construct a non-redundant Orthoptera-specific TE library: Orthoptera-TElib. Orthoptera-TElib contains 24,021 TE entries including the re-annotated results of 13,964 unknown TEs. The naming of TE entries in Orthoptera-TElib adopts the same naming as RepeatMasker and Dfam and is encoded as the three-level form of "level1/level2-level3". Orthoptera-TElib can be directly used as an input reference database and is compatible with mainstream repetitive sequence analysis software such as RepeatMasker and dnaPipeTE. When analyzing TEs of Orthoptera species, Orthoptera-TElib performs better TE annotation as compared to Dfam and Repbase regardless of using low-coverage sequencing or genome assembly data. The most improved TE annotation result is Angaracris rhodopa, which has increased from 7.89% of the genome to 53.28%. Finally, Orthoptera-TElib is stored in Sqlite3 for the convenience of data updates and user access.

摘要

转座元件(TEs)是真核生物基因组的主要组成部分,几乎存在于所有真核生物中。TEs在物种间和物种内具有高度动态性,这显著影响了TE数据库的普遍适用性。直翅目是昆虫纲中唯一已知的基因组显著增大(0.93 - 21.48 Gb)的类群。在使用现有的TE公共数据库分析大型基因组时,TE注释的效率并不令人满意。为了解决这一局限性,随着越来越多的昆虫基因组公开可用,不断更新可用的TE资源库以及构建直翅目特异性库变得势在必行。在此,我们使用12种直翅目物种的全基因组数据对TEs进行从头注释,然后手动重新注释未分类的TEs,以构建一个非冗余的直翅目特异性TE库:直翅目 - TElib。直翅目 - TElib包含24,021个TE条目,其中包括13,964个未知TEs的重新注释结果。直翅目 - TElib中TE条目的命名采用与RepeatMasker和Dfam相同的命名方式,并编码为“level1/level2 - level3”的三级形式。直翅目 - TElib可以直接用作输入参考数据库,并且与RepeatMasker和dnaPipeTE等主流重复序列分析软件兼容。在分析直翅目物种的TEs时,无论使用低覆盖度测序还是基因组组装数据,直翅目 - TElib的TE注释效果都比Dfam和Repbase更好。注释效果提升最显著的是红胫戟纹蝗,其基因组占比从7.89%增加到了53.28%。最后,直翅目 - TElib存储在Sqlite3中,以便于数据更新和用户访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7e9/10941475/7b0aa407fcf0/13100_2024_316_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验