Suppr超能文献

tRNADB-CE:大数据时代的及时tRNA基因数据库。

tRNADB-CE: tRNA gene database well-timed in the era of big sequence data.

作者信息

Abe Takashi, Inokuchi Hachiro, Yamada Yuko, Muto Akira, Iwasaki Yuki, Ikemura Toshimichi

机构信息

Graduate School of Science and Technology, Niigata University Niigata, Japan.

Nagahama Institute of Bio-Science and Technology, Nagahama Shiga, Japan.

出版信息

Front Genet. 2014 May 1;5:114. doi: 10.3389/fgene.2014.00114. eCollection 2014.

Abstract

The tRNA gene data base curated by experts "tRNADB-CE" (http://trna.ie.niigata-u.ac.jp) was constructed by analyzing 1,966 complete and 5,272 draft genomes of prokaryotes, 171 viruses', 121 chloroplasts', and 12 eukaryotes' genomes plus fragment sequences obtained by metagenome studies of environmental samples. 595,115 tRNA genes in total, and thus two times of genes compiled previously, have been registered, for which sequence, clover-leaf structure, and results of sequence-similarity and oligonucleotide-pattern searches can be browsed. To provide collective knowledge with help from experts in tRNA researches, we added a column for enregistering comments to each tRNA. By grouping bacterial tRNAs with an identical sequence, we have found high phylogenetic preservation of tRNA sequences, especially at the phylum level. Since many species-unknown tRNAs from metagenomic sequences have sequences identical to those found in species-known prokaryotes, the identical sequence group (ISG) can provide phylogenetic markers to investigate the microbial community in an environmental ecosystem. This strategy can be applied to a huge amount of short sequences obtained from next-generation sequencers, as showing that tRNADB-CE is a well-timed database in the era of big sequence data. It is also discussed that batch-learning self-organizing-map with oligonucleotide composition is useful for efficient knowledge discovery from big sequence data.

摘要

由专家精心整理的tRNA基因数据库“tRNADB - CE”(http://trna.ie.niigata - u.ac.jp),是通过分析1966个原核生物的完整基因组和5272个草图基因组、171个病毒基因组、121个叶绿体基因组、12个真核生物基因组以及通过对环境样本进行宏基因组研究获得的片段序列构建而成。总共登记了595,115个tRNA基因,是之前汇编基因数量的两倍,用户可以浏览这些基因的序列、三叶草叶结构以及序列相似性和寡核苷酸模式搜索结果。为了在tRNA研究专家的帮助下提供全面的知识,我们为每个tRNA添加了一个用于登记注释的栏目。通过对具有相同序列的细菌tRNA进行分组,我们发现tRNA序列具有高度的系统发育保守性,尤其是在门水平上。由于许多来自宏基因组序列的未知物种tRNA具有与已知原核生物中发现的序列相同的序列,因此相同序列组(ISG)可以提供系统发育标记,用于研究环境生态系统中的微生物群落。这种策略可以应用于从下一代测序仪获得的大量短序列,这表明tRNADB - CE是大数据时代一个适时的数据库。文中还讨论了具有寡核苷酸组成的批量学习自组织映射对于从大数据序列中进行高效知识发现是有用的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7eb/4013482/6b708fa9b1f3/fgene-05-00114-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验