Suppr超能文献

真核生物基因组中LTR反转录转座子的从头鉴定。

De novo identification of LTR retrotransposons in eukaryotic genomes.

作者信息

Rho Mina, Choi Jeong-Hyeon, Kim Sun, Lynch Michael, Tang Haixu

机构信息

Department of Computer Science, Indiana University, Bloomington, IN 47405, USA.

出版信息

BMC Genomics. 2007 Apr 3;8:90. doi: 10.1186/1471-2164-8-90.

Abstract

BACKGROUND

LTR retrotransposons are a class of mobile genetic elements containing two similar long terminal repeats (LTRs). Currently, LTR retrotransposons are annotated in eukaryotic genomes mainly through the conventional homology searching approach. Hence, it is limited to annotating known elements.

RESULTS

In this paper, we report a de novo computational method that can identify new LTR retrotransposons without relying on a library of known elements. Specifically, our method identifies intact LTR retrotransposons by using an approximate string matching technique and protein domain analysis. In addition, it identifies partially deleted or solo LTRs using profile Hidden Markov Models (pHMMs). As a result, this method can de novo identify all types of LTR retrotransposons. We tested this method on the two pairs of eukaryotic genomes, C. elegans vs. C. briggsae and D. melanogaster vs. D. pseudoobscura. LTR retrotransposons in C. elegans and D. melanogaster have been intensively studied using conventional annotation methods. Comparing with previous work, we identified new intact LTR retroelements and new putative families, which may imply that there may still be new retroelements that are left to be discovered even in well-studied organisms. To assess the sensitivity and accuracy of our method, we compared our results with a previously published method, LTR_STRUC, which predominantly identifies full-length LTR retrotransposons. In summary, both methods identified comparable number of intact LTR retroelements. But our method can identify nearly all known elements in C. elegans, while LTR_STRUCT missed about 1/3 of them. Our method also identified more known LTR retroelements than LTR_STRUCT in the D. melanogaster genome. We also identified some LTR retroelements in the other two genomes, C. briggsae and D. pseudoobscura, which have not been completely finished. In contrast, the conventional method failed to identify those elements. Finally, the phylogenetic and chromosomal distributions of the identified elements are discussed.

CONCLUSION

We report a novel method for de novo identification of LTR retrotransposons in eukaryotic genomes with favorable performance over the existing methods.

摘要

背景

长末端重复序列(LTR)逆转座子是一类包含两个相似长末端重复序列(LTR)的可移动遗传元件。目前,LTR逆转座子在真核生物基因组中的注释主要通过传统的同源性搜索方法进行。因此,它仅限于注释已知元件。

结果

在本文中,我们报告了一种从头计算方法,该方法可以在不依赖已知元件库的情况下识别新的LTR逆转座子。具体而言,我们的方法通过使用近似字符串匹配技术和蛋白质结构域分析来识别完整的LTR逆转座子。此外,它使用轮廓隐马尔可夫模型(pHMM)识别部分缺失或单独的LTR。结果,该方法可以从头识别所有类型的LTR逆转座子。我们在两对真核生物基因组,秀丽隐杆线虫与briggsae线虫以及黑腹果蝇与拟暗果蝇上测试了该方法。秀丽隐杆线虫和黑腹果蝇中的LTR逆转座子已使用传统注释方法进行了深入研究。与先前的工作相比,我们鉴定出了新的完整LTR逆转元件和新的推定家族,这可能意味着即使在研究充分的生物体中,可能仍有新的逆转元件有待发现。为了评估我们方法的敏感性和准确性,我们将我们的结果与先前发表的方法LTR_STRUC进行了比较,该方法主要识别全长LTR逆转座子。总之,两种方法鉴定出的完整LTR逆转元件数量相当。但是我们的方法可以识别秀丽隐杆线虫中几乎所有已知元件,而LTR_STRUCT遗漏了约三分之一。在黑腹果蝇基因组中,我们的方法还比LTR_STRUCT鉴定出更多已知的LTR逆转元件。我们还在另外两个尚未完全完成的基因组briggsae线虫和拟暗果蝇中鉴定出了一些LTR逆转元件。相比之下,传统方法未能识别出这些元件。最后,讨论了所鉴定元件的系统发育和染色体分布。

结论

我们报告了一种在真核生物基因组中从头鉴定LTR逆转座子的新方法,其性能优于现有方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b927/1858694/9bad67d50539/1471-2164-8-90-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验