• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于大量基因组序列集体分析的人工智能:从大流行的严重急性呼吸综合征冠状病毒2的小基因组到人类基因组的各种实例。

AI for the collective analysis of a massive number of genome sequences: various examples from the small genome of pandemic SARS-CoV-2 to the human genome.

作者信息

Ikemura Toshimichi, Iwasaki Yuki, Wada Kennosuke, Wada Yoshiko, Abe Takashi

机构信息

Faculty of Bioscience, Nagahama Institute of Bio-Science and Technology.

Department of Information Engineering, Faculty of Engineering, Niigata University.

出版信息

Genes Genet Syst. 2021 Dec 16;96(4):165-176. doi: 10.1266/ggs.21-00025. Epub 2021 Sep 27.

DOI:10.1266/ggs.21-00025
PMID:34565757
Abstract

In genetics and related fields, huge amounts of data, such as genome sequences, are accumulating, and the use of artificial intelligence (AI) suitable for big data analysis has become increasingly important. Unsupervised AI that can reveal novel knowledge from big data without prior knowledge or particular models is highly desirable for analyses of genome sequences, particularly for obtaining unexpected insights. We have developed a batch-learning self-organizing map (BLSOM) for oligonucleotide compositions that can reveal various novel genome characteristics. Here, we explain the data mining by the BLSOM: an unsupervised AI. As a specific target, we first selected SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) because a large number of viral genome sequences have been accumulated via worldwide efforts. We analyzed more than 0.6 million sequences collected primarily in the first year of the pandemic. BLSOMs for short oligonucleotides (e.g., 4-6-mers) allowed separation into known clades, but longer oligonucleotides further increased the separation ability and revealed subgrouping within known clades. In the case of 15-mers, there is mostly one copy in the genome; thus, 15-mers that appeared after the epidemic started could be connected to mutations, and the BLSOM for 15-mers revealed the mutations that contributed to separation into known clades and their subgroups. After introducing the detailed methodological strategies, we explain BLSOMs for various topics, such as the tetranucleotide BLSOM for over 5 million 5-kb fragment sequences derived from almost all microorganisms currently available and its use in metagenome studies. We also explain BLSOMs for various eukaryotes, including fishes, frogs and Drosophila species, and found a high separation ability among closely related species. When analyzing the human genome, we found enrichments in transcription factor-binding sequences in centromeric and pericentromeric heterochromatin regions. The tDNAs (tRNA genes) could be separated according to their corresponding amino acid.

摘要

在遗传学及相关领域,大量数据,如基因组序列正在不断积累,适用于大数据分析的人工智能(AI)的应用变得愈发重要。对于基因组序列分析而言,尤其是为了获得意想不到的见解,能够在无需先验知识或特定模型的情况下从大数据中揭示新知识的无监督AI非常必要。我们开发了一种用于寡核苷酸组成的批量学习自组织映射(BLSOM),它可以揭示各种新的基因组特征。在此,我们解释通过BLSOM进行的数据挖掘:一种无监督AI。作为一个具体目标,我们首先选择了严重急性呼吸综合征冠状病毒2(SARS-CoV-2),因为通过全球范围内的努力已经积累了大量的病毒基因组序列。我们分析了主要在疫情第一年收集的超过60万条序列。针对短寡核苷酸(例如4至6聚体)的BLSOM能够将其分离为已知的进化枝,但更长的寡核苷酸进一步提高了分离能力,并揭示了已知进化枝内的亚分组情况。对于15聚体而言,基因组中大多只有一个拷贝;因此,疫情开始后出现的15聚体可能与突变相关,而针对15聚体的BLSOM揭示了导致分离为已知进化枝及其亚组的突变。在介绍详细的方法策略之后,我们解释了针对各种主题的BLSOM,例如针对来自几乎所有现有微生物的500多万条5千碱基片段序列的四核苷酸BLSOM及其在宏基因组研究中的应用。我们还解释了针对各种真核生物(包括鱼类、青蛙和果蝇物种)的BLSOM,并发现其在亲缘关系密切的物种之间具有很高的分离能力。在分析人类基因组时,我们发现在着丝粒和着丝粒周围异染色质区域的转录因子结合序列中存在富集现象。tRNA基因(转运RNA基因)可以根据其相应的氨基酸进行分离。

相似文献

1
AI for the collective analysis of a massive number of genome sequences: various examples from the small genome of pandemic SARS-CoV-2 to the human genome.用于大量基因组序列集体分析的人工智能:从大流行的严重急性呼吸综合征冠状病毒2的小基因组到人类基因组的各种实例。
Genes Genet Syst. 2021 Dec 16;96(4):165-176. doi: 10.1266/ggs.21-00025. Epub 2021 Sep 27.
2
Comparative genomic analysis of the human genome and six bat genomes using unsupervised machine learning: Mb-level CpG and TFBS islands.使用无监督机器学习对人类基因组和六倍体蝙蝠基因组进行比较基因组分析:Mb 级 CpG 和 TFBS 岛。
BMC Genomics. 2022 Jul 8;23(1):497. doi: 10.1186/s12864-022-08664-9.
3
Mb-level CpG and TFBS islands visualized by AI and their roles in the nuclear organization of the human genome.通过人工智能可视化的兆碱基级别的CpG和转录因子结合位点岛及其在人类基因组核组织中的作用。
Genes Genet Syst. 2020 Apr 22;95(1):29-41. doi: 10.1266/ggs.19-00027. Epub 2020 Mar 12.
4
Unsupervised explainable AI for molecular evolutionary study of forty thousand SARS-CoV-2 genomes.用于四万 SARS-CoV-2 基因组的分子进化研究的无监督可解释人工智能。
BMC Microbiol. 2022 Mar 10;22(1):73. doi: 10.1186/s12866-022-02484-3.
5
An artificial intelligence approach fit for tRNA gene studies in the era of big sequence data.一种适用于大数据序列时代tRNA基因研究的人工智能方法。
Genes Genet Syst. 2017 Sep 12;92(1):43-54. doi: 10.1266/ggs.16-00068. Epub 2017 Mar 24.
6
Unsupervised AI reveals insect species-specific genome signatures.无监督人工智能揭示昆虫物种特异性基因组特征。
PeerJ. 2024 Mar 6;12:e17025. doi: 10.7717/peerj.17025. eCollection 2024.
7
A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM).一种用于分析微生物大序列数据以实现高效知识发现的新型生物信息学策略:批学习自组织映射(BLSOM)。
Microorganisms. 2013 Nov 20;1(1):137-157. doi: 10.3390/microorganisms1010137.
8
AI-based search for convergently expanding, advantageous mutations in SARS-CoV-2 by focusing on oligonucleotide frequencies.基于人工智能的方法通过关注寡核苷酸频率来搜索 SARS-CoV-2 中趋同扩张的有利突变。
PLoS One. 2022 Aug 31;17(8):e0273860. doi: 10.1371/journal.pone.0273860. eCollection 2022.
9
CG-containing oligonucleotides and transcription factor-binding motifs are enriched in human pericentric regions.含CG的寡核苷酸和转录因子结合基序在人类着丝粒周围区域富集。
Genes Genet Syst. 2015;90(1):43-53. doi: 10.1266/ggs.90.43.
10
A novel bioinformatics method for efficient knowledge discovery by BLSOM from big genomic sequence data.一种通过BLSOM从大型基因组序列数据中进行高效知识发现的新型生物信息学方法。
Biomed Res Int. 2014;2014:765648. doi: 10.1155/2014/765648. Epub 2014 Apr 3.

引用本文的文献

1
Unsupervised AI reveals insect species-specific genome signatures.无监督人工智能揭示昆虫物种特异性基因组特征。
PeerJ. 2024 Mar 6;12:e17025. doi: 10.7717/peerj.17025. eCollection 2024.
2
AI-based search for convergently expanding, advantageous mutations in SARS-CoV-2 by focusing on oligonucleotide frequencies.基于人工智能的方法通过关注寡核苷酸频率来搜索 SARS-CoV-2 中趋同扩张的有利突变。
PLoS One. 2022 Aug 31;17(8):e0273860. doi: 10.1371/journal.pone.0273860. eCollection 2022.
3
Comparative genomic analysis of the human genome and six bat genomes using unsupervised machine learning: Mb-level CpG and TFBS islands.
使用无监督机器学习对人类基因组和六倍体蝙蝠基因组进行比较基因组分析:Mb 级 CpG 和 TFBS 岛。
BMC Genomics. 2022 Jul 8;23(1):497. doi: 10.1186/s12864-022-08664-9.