• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DNASimCLR:一种基于对比学习的深度学习方法,用于基因序列数据分类。

DNASimCLR: a contrastive learning-based deep learning approach for gene sequence data classification.

机构信息

Shandong University, Weihai, People's Republic of China.

Beijing Research Institute of Automation for Machinery Industry, Beijing, People's Republic of China.

出版信息

BMC Bioinformatics. 2024 Oct 14;25(1):328. doi: 10.1186/s12859-024-05955-8.

DOI:10.1186/s12859-024-05955-8
PMID:39402441
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11476100/
Abstract

BACKGROUND

The rapid advancements in deep neural network models have significantly enhanced the ability to extract features from microbial sequence data, which is critical for addressing biological challenges. However, the scarcity and complexity of labeled microbial data pose substantial difficulties for supervised learning approaches. To address these issues, we propose DNASimCLR, an unsupervised framework designed for efficient gene sequence data feature extraction.

RESULTS

DNASimCLR leverages convolutional neural networks and the SimCLR framework, based on contrastive learning, to extract intricate features from diverse microbial gene sequences. Pre-training was conducted on two classic large scale unlabelled datasets encompassing metagenomes and viral gene sequences. Subsequent classification tasks were performed by fine-tuning the pretrained model using the previously acquired model. Our experiments demonstrate that DNASimCLR is at least comparable to state-of-the-art techniques for gene sequence classification. For convolutional neural network-based approaches, DNASimCLR surpasses the latest existing methods, clearly establishing its superiority over the state-of-the-art CNN-based feature extraction techniques. Furthermore, the model exhibits superior performance across diverse tasks in analyzing biological sequence data, showcasing its robust adaptability.

CONCLUSIONS

DNASimCLR represents a robust and database-agnostic solution for gene sequence classification. Its versatility allows it to perform well in scenarios involving novel or previously unseen gene sequences, making it a valuable tool for diverse applications in genomics.

摘要

背景

深度神经网络模型的快速发展极大地提高了从微生物序列数据中提取特征的能力,这对于解决生物学挑战至关重要。 然而,标记微生物数据的稀缺性和复杂性给监督学习方法带来了很大的困难。 为了解决这些问题,我们提出了 DNASimCLR,这是一种用于高效基因序列数据特征提取的无监督框架。

结果

DNASimCLR 利用卷积神经网络和基于对比学习的 SimCLR 框架,从各种微生物基因序列中提取复杂的特征。 在两个经典的大规模未标记数据集(包含宏基因组和病毒基因序列)上进行了预训练。 使用先前获得的模型对预训练模型进行微调,以执行随后的分类任务。 我们的实验表明,DNASimCLR 至少与基因序列分类的最新技术相当。 对于基于卷积神经网络的方法,DNASimCLR 优于最新的现有方法,这清楚地证明了它优于基于最先进的 CNN 的特征提取技术。 此外,该模型在分析生物序列数据的各种任务中表现出优越的性能,展示了其强大的适应性。

结论

DNASimCLR 是一种用于基因序列分类的强大且与数据库无关的解决方案。 它的多功能性使其在涉及新的或以前未见的基因序列的情况下表现良好,使其成为基因组学中各种应用的有价值工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e89f/11476100/228695b0559e/12859_2024_5955_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e89f/11476100/6263f2850dcb/12859_2024_5955_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e89f/11476100/f22d36e034d8/12859_2024_5955_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e89f/11476100/b32c5014ddc2/12859_2024_5955_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e89f/11476100/a787ffd71a4b/12859_2024_5955_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e89f/11476100/8a08bbb4ef5f/12859_2024_5955_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e89f/11476100/83c908849909/12859_2024_5955_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e89f/11476100/228695b0559e/12859_2024_5955_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e89f/11476100/6263f2850dcb/12859_2024_5955_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e89f/11476100/f22d36e034d8/12859_2024_5955_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e89f/11476100/b32c5014ddc2/12859_2024_5955_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e89f/11476100/a787ffd71a4b/12859_2024_5955_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e89f/11476100/8a08bbb4ef5f/12859_2024_5955_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e89f/11476100/83c908849909/12859_2024_5955_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e89f/11476100/228695b0559e/12859_2024_5955_Fig7_HTML.jpg

相似文献

1
DNASimCLR: a contrastive learning-based deep learning approach for gene sequence data classification.DNASimCLR:一种基于对比学习的深度学习方法,用于基因序列数据分类。
BMC Bioinformatics. 2024 Oct 14;25(1):328. doi: 10.1186/s12859-024-05955-8.
2
Self-supervised pre-training with contrastive and masked autoencoder methods for dealing with small datasets in deep learning for medical imaging.基于对比和掩蔽自动编码器方法的自监督预训练在医学影像深度学习中小数据集处理中的应用。
Sci Rep. 2023 Nov 20;13(1):20260. doi: 10.1038/s41598-023-46433-0.
3
MediDRNet: Tackling category imbalance in diabetic retinopathy classification with dual-branch learning and prototypical contrastive learning.MediDRNet:使用双分支学习和原型对比学习解决糖尿病视网膜病变分类中的类别不平衡问题。
Comput Methods Programs Biomed. 2024 Aug;253:108230. doi: 10.1016/j.cmpb.2024.108230. Epub 2024 May 17.
4
sCL-ST: Supervised Contrastive Learning With Semantic Transformations for Multiple Lead ECG Arrhythmia Classification.sCL-ST:基于语义转换的监督对比学习在多导联 ECG 心律失常分类中的应用。
IEEE J Biomed Health Inform. 2023 Jun;27(6):2818-2828. doi: 10.1109/JBHI.2023.3246241. Epub 2023 Jun 5.
5
Transformer-based unsupervised contrastive learning for histopathological image classification.基于 Transformer 的无监督对比学习在组织病理学图像分类中的应用。
Med Image Anal. 2022 Oct;81:102559. doi: 10.1016/j.media.2022.102559. Epub 2022 Jul 30.
6
BioDeepfuse: a hybrid deep learning approach with integrated feature extraction techniques for enhanced non-coding RNA classification.BioDeepfuse:一种混合深度学习方法,结合了集成特征提取技术,用于增强非编码 RNA 分类。
RNA Biol. 2024 Jan;21(1):1-12. doi: 10.1080/15476286.2024.2329451. Epub 2024 Mar 25.
7
Self-supervised learning for remote sensing scene classification under the few shot scenario.基于小样本场景的遥感场景分类的自监督学习。
Sci Rep. 2023 Jan 9;13(1):433. doi: 10.1038/s41598-022-27313-5.
8
Contrastive self-supervised learning for diabetic retinopathy early detection.对比自监督学习在糖尿病视网膜病变早期检测中的应用。
Med Biol Eng Comput. 2023 Sep;61(9):2441-2452. doi: 10.1007/s11517-023-02810-5. Epub 2023 Apr 29.
9
Investigating Contrastive Pair Learning's Frontiers in Supervised, Semisupervised, and Self-Supervised Learning.探究对比对学习在监督学习、半监督学习和自监督学习中的前沿进展。
J Imaging. 2024 Aug 13;10(8):196. doi: 10.3390/jimaging10080196.
10
A knowledge-based learning framework for self-supervised pre-training towards enhanced recognition of biomedical microscopy images.基于知识的学习框架,用于自我监督的预训练,以增强对生物医学显微镜图像的识别。
Neural Netw. 2023 Oct;167:810-826. doi: 10.1016/j.neunet.2023.09.001. Epub 2023 Sep 12.

本文引用的文献

1
Augmentation-Free Graph Contrastive Learning of Invariant-Discriminative Representations.无增强的不变判别表示的图对比学习
IEEE Trans Neural Netw Learn Syst. 2024 Aug;35(8):11157-11167. doi: 10.1109/TNNLS.2023.3248871. Epub 2024 Aug 5.
2
Microbiome systems biology advancements for natural well-being.微生物组系统生物学在自然健康方面的进展。
Sci Total Environ. 2022 Sep 10;838(Pt 2):155915. doi: 10.1016/j.scitotenv.2022.155915. Epub 2022 May 11.
3
GAMMA: a tool for the rapid identification, classification and annotation of translated gene matches from sequencing data.
GAMMA:一种用于从测序数据中快速识别、分类和注释翻译后的基因匹配项的工具。
Bioinformatics. 2022 Jan 3;38(2):546-548. doi: 10.1093/bioinformatics/btab607.
4
Identifying viruses from metagenomic data using deep learning.利用深度学习从宏基因组数据中识别病毒。
Quant Biol. 2020 Mar;8(1):64-77. doi: 10.1007/s40484-019-0187-4.
5
DeepViral: prediction of novel virus-host interactions from protein sequences and infectious disease phenotypes.DeepViral:基于蛋白质序列和传染病表型预测新型病毒与宿主的相互作用
Bioinformatics. 2021 Sep 9;37(17):2722-2729. doi: 10.1093/bioinformatics/btab147.
6
DeepMicrobes: taxonomic classification for metagenomics with deep learning.深度微生物:用于宏基因组学的深度学习分类法
NAR Genom Bioinform. 2020 Feb 19;2(1):lqaa009. doi: 10.1093/nargab/lqaa009. eCollection 2020 Mar.
7
Interpretable detection of novel human viruses from genome sequencing data.从基因组测序数据中对新型人类病毒进行可解释的检测。
NAR Genom Bioinform. 2021 Feb 1;3(1):lqab004. doi: 10.1093/nargab/lqab004. eCollection 2021 Mar.
8
VIDHOP, viral host prediction with deep learning.VIDHOP,基于深度学习的病毒宿主预测。
Bioinformatics. 2021 Apr 20;37(3):318-325. doi: 10.1093/bioinformatics/btaa705.
9
ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples.ViraMiner:在原始 DNA 序列上进行深度学习,以鉴定人类样本中的病毒基因组。
PLoS One. 2019 Sep 11;14(9):e0222271. doi: 10.1371/journal.pone.0222271. eCollection 2019.
10
Deep learning: new computational modelling techniques for genomics.深度学习:基因组学的新计算建模技术。
Nat Rev Genet. 2019 Jul;20(7):389-403. doi: 10.1038/s41576-019-0122-6.