• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种适用于大数据序列时代tRNA基因研究的人工智能方法。

An artificial intelligence approach fit for tRNA gene studies in the era of big sequence data.

作者信息

Iwasaki Yuki, Abe Takashi, Wada Kennosuke, Wada Yoshiko, Ikemura Toshimichi

机构信息

Department of Bioscience, Nagahama Institute of Bio-Science and Technology.

Department of Information Engineering, Faculty of Engineering, Niigata University.

出版信息

Genes Genet Syst. 2017 Sep 12;92(1):43-54. doi: 10.1266/ggs.16-00068. Epub 2017 Mar 24.

DOI:10.1266/ggs.16-00068
PMID:28344190
Abstract

Unsupervised data mining capable of extracting a wide range of knowledge from big data without prior knowledge or particular models is a timely application in the era of big sequence data accumulation in genome research. By handling oligonucleotide compositions as high-dimensional data, we have previously modified the conventional self-organizing map (SOM) for genome informatics and established BLSOM, which can analyze more than ten million sequences simultaneously. Here, we develop BLSOM specialized for tRNA genes (tDNAs) that can cluster (self-organize) more than one million microbial tDNAs according to their cognate amino acid solely depending on tetra- and pentanucleotide compositions. This unsupervised clustering can reveal combinatorial oligonucleotide motifs that are responsible for the amino acid-dependent clustering, as well as other functionally and structurally important consensus motifs, which have been evolutionarily conserved. BLSOM is also useful for identifying tDNAs as phylogenetic markers for special phylotypes. When we constructed BLSOM with 'species-unknown' tDNAs from metagenomic sequences plus 'species-known' microbial tDNAs, a large portion of metagenomic tDNAs self-organized with species-known tDNAs, yielding information on microbial communities in environmental samples. BLSOM can also enhance accuracy in the tDNA database obtained from big sequence data. This unsupervised data mining should become important for studying numerous functionally unclear RNAs obtained from a wide range of organisms.

摘要

在基因组研究中,无监督数据挖掘能够在无需先验知识或特定模型的情况下,从大数据中提取广泛的知识,这在大序列数据积累的时代是一种适时的应用。通过将寡核苷酸组成作为高维数据处理,我们之前对传统的自组织映射(SOM)进行了修改,用于基因组信息学,并建立了BLSOM,它能够同时分析超过一千万个序列。在这里,我们开发了专门用于tRNA基因(tDNA)的BLSOM,它可以仅根据四核苷酸和五核苷酸组成,根据其同源氨基酸对超过一百万个微生物tDNA进行聚类(自组织)。这种无监督聚类可以揭示负责氨基酸依赖性聚类的组合寡核苷酸基序,以及其他在功能和结构上重要的、在进化上保守的共有基序。BLSOM也可用于将tDNA识别为特殊系统型的系统发育标记。当我们用来自宏基因组序列的“未知物种”tDNA加上“已知物种”的微生物tDNA构建BLSOM时,很大一部分宏基因组tDNA与已知物种的tDNA自组织在一起,从而产生有关环境样品中微生物群落的信息。BLSOM还可以提高从大序列数据中获得的tDNA数据库的准确性。这种无监督数据挖掘对于研究从广泛生物体中获得的众多功能尚不清楚的RNA应该会变得很重要。

相似文献

1
An artificial intelligence approach fit for tRNA gene studies in the era of big sequence data.一种适用于大数据序列时代tRNA基因研究的人工智能方法。
Genes Genet Syst. 2017 Sep 12;92(1):43-54. doi: 10.1266/ggs.16-00068. Epub 2017 Mar 24.
2
AI for the collective analysis of a massive number of genome sequences: various examples from the small genome of pandemic SARS-CoV-2 to the human genome.用于大量基因组序列集体分析的人工智能:从大流行的严重急性呼吸综合征冠状病毒2的小基因组到人类基因组的各种实例。
Genes Genet Syst. 2021 Dec 16;96(4):165-176. doi: 10.1266/ggs.21-00025. Epub 2021 Sep 27.
3
A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM).一种用于分析微生物大序列数据以实现高效知识发现的新型生物信息学策略:批学习自组织映射(BLSOM)。
Microorganisms. 2013 Nov 20;1(1):137-157. doi: 10.3390/microorganisms1010137.
4
CG-containing oligonucleotides and transcription factor-binding motifs are enriched in human pericentric regions.含CG的寡核苷酸和转录因子结合基序在人类着丝粒周围区域富集。
Genes Genet Syst. 2015;90(1):43-53. doi: 10.1266/ggs.90.43.
5
A novel bioinformatics method for efficient knowledge discovery by BLSOM from big genomic sequence data.一种通过BLSOM从大型基因组序列数据中进行高效知识发现的新型生物信息学方法。
Biomed Res Int. 2014;2014:765648. doi: 10.1155/2014/765648. Epub 2014 Apr 3.
6
Development of self-compressing BLSOM for comprehensive analysis of big sequence data.用于大序列数据综合分析的自压缩BLSOM的开发。
Biomed Res Int. 2015;2015:506052. doi: 10.1155/2015/506052. Epub 2015 Oct 1.
7
Unsupervised explainable AI for molecular evolutionary study of forty thousand SARS-CoV-2 genomes.用于四万 SARS-CoV-2 基因组的分子进化研究的无监督可解释人工智能。
BMC Microbiol. 2022 Mar 10;22(1):73. doi: 10.1186/s12866-022-02484-3.
8
tRNADB-CE: tRNA gene database well-timed in the era of big sequence data.tRNADB-CE:大数据时代的及时tRNA基因数据库。
Front Genet. 2014 May 1;5:114. doi: 10.3389/fgene.2014.00114. eCollection 2014.
9
Comparative genomic analysis of the human genome and six bat genomes using unsupervised machine learning: Mb-level CpG and TFBS islands.使用无监督机器学习对人类基因组和六倍体蝙蝠基因组进行比较基因组分析:Mb 级 CpG 和 TFBS 岛。
BMC Genomics. 2022 Jul 8;23(1):497. doi: 10.1186/s12864-022-08664-9.
10
A novel bioinformatics strategy for searching industrially useful genome resources from metagenomic sequence libraries.一种从宏基因组序列文库中搜索具有工业用途的基因组资源的新型生物信息学策略。
Genes Genet Syst. 2011;86(1):53-66. doi: 10.1266/ggs.86.53.