Suppr超能文献

DNASimCLR:一种基于对比学习的深度学习方法,用于基因序列数据分类。

DNASimCLR: a contrastive learning-based deep learning approach for gene sequence data classification.

机构信息

Shandong University, Weihai, People's Republic of China.

Beijing Research Institute of Automation for Machinery Industry, Beijing, People's Republic of China.

出版信息

BMC Bioinformatics. 2024 Oct 14;25(1):328. doi: 10.1186/s12859-024-05955-8.

Abstract

BACKGROUND

The rapid advancements in deep neural network models have significantly enhanced the ability to extract features from microbial sequence data, which is critical for addressing biological challenges. However, the scarcity and complexity of labeled microbial data pose substantial difficulties for supervised learning approaches. To address these issues, we propose DNASimCLR, an unsupervised framework designed for efficient gene sequence data feature extraction.

RESULTS

DNASimCLR leverages convolutional neural networks and the SimCLR framework, based on contrastive learning, to extract intricate features from diverse microbial gene sequences. Pre-training was conducted on two classic large scale unlabelled datasets encompassing metagenomes and viral gene sequences. Subsequent classification tasks were performed by fine-tuning the pretrained model using the previously acquired model. Our experiments demonstrate that DNASimCLR is at least comparable to state-of-the-art techniques for gene sequence classification. For convolutional neural network-based approaches, DNASimCLR surpasses the latest existing methods, clearly establishing its superiority over the state-of-the-art CNN-based feature extraction techniques. Furthermore, the model exhibits superior performance across diverse tasks in analyzing biological sequence data, showcasing its robust adaptability.

CONCLUSIONS

DNASimCLR represents a robust and database-agnostic solution for gene sequence classification. Its versatility allows it to perform well in scenarios involving novel or previously unseen gene sequences, making it a valuable tool for diverse applications in genomics.

摘要

背景

深度神经网络模型的快速发展极大地提高了从微生物序列数据中提取特征的能力,这对于解决生物学挑战至关重要。 然而,标记微生物数据的稀缺性和复杂性给监督学习方法带来了很大的困难。 为了解决这些问题,我们提出了 DNASimCLR,这是一种用于高效基因序列数据特征提取的无监督框架。

结果

DNASimCLR 利用卷积神经网络和基于对比学习的 SimCLR 框架,从各种微生物基因序列中提取复杂的特征。 在两个经典的大规模未标记数据集(包含宏基因组和病毒基因序列)上进行了预训练。 使用先前获得的模型对预训练模型进行微调,以执行随后的分类任务。 我们的实验表明,DNASimCLR 至少与基因序列分类的最新技术相当。 对于基于卷积神经网络的方法,DNASimCLR 优于最新的现有方法,这清楚地证明了它优于基于最先进的 CNN 的特征提取技术。 此外,该模型在分析生物序列数据的各种任务中表现出优越的性能,展示了其强大的适应性。

结论

DNASimCLR 是一种用于基因序列分类的强大且与数据库无关的解决方案。 它的多功能性使其在涉及新的或以前未见的基因序列的情况下表现良好,使其成为基因组学中各种应用的有价值工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e89f/11476100/6263f2850dcb/12859_2024_5955_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验