• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

染色质状态在内含子保留中的作用:利用大规模深度学习模型的案例研究。

The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.

作者信息

Daoud Ahmed, Ben-Hur Asa

机构信息

Department of Computer Science, Colorado State University, Fort Collins, Colorado, United States of America.

出版信息

PLoS Comput Biol. 2025 Jan 10;21(1):e1012755. doi: 10.1371/journal.pcbi.1012755. eCollection 2025 Jan.

DOI:10.1371/journal.pcbi.1012755
PMID:39792954
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11756788/
Abstract

Complex deep learning models trained on very large datasets have become key enabling tools for current research in natural language processing and computer vision. By providing pre-trained models that can be fine-tuned for specific applications, they enable researchers to create accurate models with minimal effort and computational resources. Large scale genomics deep learning models come in two flavors: the first are large language models of DNA sequences trained in a self-supervised fashion, similar to the corresponding natural language models; the second are supervised learning models that leverage large scale genomics datasets from ENCODE and other sources. We argue that these models are the equivalent of foundation models in natural language processing in their utility, as they encode within them chromatin state in its different aspects, providing useful representations that allow quick deployment of accurate models of gene regulation. We demonstrate this premise by leveraging the recently created Sei model to develop simple, interpretable models of intron retention, and demonstrate their advantage over models based on the DNA language model DNABERT-2. Our work also demonstrates the impact of chromatin state on the regulation of intron retention. Using representations learned by Sei, our model is able to discover the involvement of transcription factors and chromatin marks in regulating intron retention, providing better accuracy than a recently published custom model developed for this purpose.

摘要

在非常大的数据集上训练的复杂深度学习模型已成为当前自然语言处理和计算机视觉研究的关键支持工具。通过提供可针对特定应用进行微调的预训练模型,它们使研究人员能够以最少的工作量和计算资源创建准确的模型。大规模基因组深度学习模型有两种类型:第一种是类似于相应自然语言模型的以自监督方式训练的DNA序列大语言模型;第二种是利用来自ENCODE和其他来源的大规模基因组数据集的监督学习模型。我们认为,这些模型在效用上等同于自然语言处理中的基础模型,因为它们在其中编码了染色质状态的不同方面,提供了有用的表示,从而允许快速部署准确的基因调控模型。我们通过利用最近创建的Sei模型来开发简单、可解释的内含子保留模型来证明这一前提,并证明它们相对于基于DNA语言模型DNABERT-2的模型的优势。我们的工作还证明了染色质状态对内含子保留调控的影响。使用Sei学习的表示,我们的模型能够发现转录因子和染色质标记在调节内含子保留中的作用,比最近为此目的开发的定制模型提供了更高的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4da/11756788/ca6da0dc7901/pcbi.1012755.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4da/11756788/c9024444f74e/pcbi.1012755.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4da/11756788/283adc0600d9/pcbi.1012755.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4da/11756788/160ed216698c/pcbi.1012755.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4da/11756788/8a342501c9ef/pcbi.1012755.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4da/11756788/ca6da0dc7901/pcbi.1012755.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4da/11756788/c9024444f74e/pcbi.1012755.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4da/11756788/283adc0600d9/pcbi.1012755.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4da/11756788/160ed216698c/pcbi.1012755.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4da/11756788/8a342501c9ef/pcbi.1012755.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4da/11756788/ca6da0dc7901/pcbi.1012755.g005.jpg

相似文献

1
The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.染色质状态在内含子保留中的作用:利用大规模深度学习模型的案例研究。
PLoS Comput Biol. 2025 Jan 10;21(1):e1012755. doi: 10.1371/journal.pcbi.1012755. eCollection 2025 Jan.
2
Deep learning: new computational modelling techniques for genomics.深度学习:基因组学的新计算建模技术。
Nat Rev Genet. 2019 Jul;20(7):389-403. doi: 10.1038/s41576-019-0122-6.
3
Porter 6: Protein Secondary Structure Prediction by Leveraging Pre-Trained Language Models (PLMs).波特6:利用预训练语言模型(PLMs)进行蛋白质二级结构预测。
Int J Mol Sci. 2024 Dec 27;26(1):130. doi: 10.3390/ijms26010130.
4
Current genomic deep learning models display decreased performance in cell type-specific accessible regions.目前的基因组深度学习模型在细胞类型特异性可及区域的表现有所下降。
Genome Biol. 2024 Aug 1;25(1):202. doi: 10.1186/s13059-024-03335-2.
5
A multi-modal transformer for cell type-agnostic regulatory predictions.一种用于细胞类型无关调节预测的多模态变压器。
Cell Genom. 2025 Feb 12;5(2):100762. doi: 10.1016/j.xgen.2025.100762. Epub 2025 Jan 29.
6
Improving entity recognition using ensembles of deep learning and fine-tuned large language models: A case study on adverse event extraction from VAERS and social media.使用深度学习集成和微调大语言模型改进实体识别:以从VAERS和社交媒体中提取不良事件为例
J Biomed Inform. 2025 Mar;163:104789. doi: 10.1016/j.jbi.2025.104789. Epub 2025 Feb 7.
7
Detecting floating litter in freshwater bodies with semi-supervised deep learning.利用半监督深度学习技术检测淡水体中的漂浮垃圾。
Water Res. 2024 Nov 15;266:122405. doi: 10.1016/j.watres.2024.122405. Epub 2024 Sep 11.
8
Evidence for the role of transcription factors in the co-transcriptional regulation of intron retention.转录因子在内含子保留的共转录调控中的作用证据。
Genome Biol. 2023 Mar 22;24(1):53. doi: 10.1186/s13059-023-02885-1.
9
Learning a deep language model for microbiomes: The power of large scale unlabeled microbiome data.学习用于微生物群落的深度语言模型:大规模未标记微生物群落数据的力量。
PLoS Comput Biol. 2025 May 7;21(5):e1011353. doi: 10.1371/journal.pcbi.1011353. eCollection 2025 May.
10
Foundation models in gastrointestinal endoscopic AI: Impact of architecture, pre-training approach and data efficiency.胃肠道内镜 AI 中的基础模型:架构、预训练方法和数据效率的影响。
Med Image Anal. 2024 Dec;98:103298. doi: 10.1016/j.media.2024.103298. Epub 2024 Aug 12.

引用本文的文献

1
Intron Retention: A Reemerging Paradigm in RNA Biology and Post-Transcriptional Gene Regulation.内含子保留:RNA生物学和转录后基因调控中重新出现的范式
Genes (Basel). 2025 Aug 21;16(8):986. doi: 10.3390/genes16080986.

本文引用的文献

1
The Functional Relationship Between RNA Splicing and the Chromatin Landscape.RNA 剪接与染色质景观的功能关系。
J Mol Biol. 2024 Aug 15;436(16):168614. doi: 10.1016/j.jmb.2024.168614. Epub 2024 May 16.
2
Evaluating the representational power of pre-trained DNA language models for regulatory genomics.评估预训练DNA语言模型在调控基因组学方面的表征能力。
bioRxiv. 2024 Sep 25:2024.02.29.582810. doi: 10.1101/2024.02.29.582810.
3
Evidence for the role of transcription factors in the co-transcriptional regulation of intron retention.
转录因子在内含子保留的共转录调控中的作用证据。
Genome Biol. 2023 Mar 22;24(1):53. doi: 10.1186/s13059-023-02885-1.
4
RNA polymerase II-associated proteins reveal pathways affected in VCP-related amyotrophic lateral sclerosis.RNA 聚合酶 II 相关蛋白揭示了 VCP 相关肌萎缩侧索硬化症中受影响的途径。
Brain. 2023 Jun 1;146(6):2547-2556. doi: 10.1093/brain/awad046.
5
H3.3 contributes to chromatin accessibility and transcription factor binding at promoter-proximal regulatory elements in embryonic stem cells.H3.3 有助于胚胎干细胞中启动子近端调控元件的染色质可及性和转录因子结合。
Genome Biol. 2023 Feb 13;24(1):25. doi: 10.1186/s13059-023-02867-3.
6
MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities.MetaBinner:一种高性能、独立的组装分类方法,可从复杂微生物群落中回收单个基因组。
Genome Biol. 2023 Jan 6;24(1):1. doi: 10.1186/s13059-022-02832-6.
7
MYC regulates a pan-cancer network of co-expressed oncogenic splicing factors.MYC 调控一个在多种癌症中共同表达的致癌剪接因子的泛癌网络。
Cell Rep. 2022 Nov 22;41(8):111704. doi: 10.1016/j.celrep.2022.111704.
8
A sequence-based global map of regulatory activity for deciphering human genetics.基于序列的人类遗传学解码调控活性的全局图谱。
Nat Genet. 2022 Jul;54(7):940-949. doi: 10.1038/s41588-022-01102-2. Epub 2022 Jul 11.
9
Predicting RNA splicing from DNA sequence using Pangolin.使用 Pangolin 从 DNA 序列预测 RNA 剪接。
Genome Biol. 2022 Apr 21;23(1):103. doi: 10.1186/s13059-022-02664-4.
10
Histone post-translational modifications - cause and consequence of genome function.组蛋白翻译后修饰——基因组功能的原因和结果。
Nat Rev Genet. 2022 Sep;23(9):563-580. doi: 10.1038/s41576-022-00468-7. Epub 2022 Mar 25.