• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

EPInformer:一种通过整合启动子-增强子序列与多组学表观基因组数据进行基因表达预测的可扩展深度学习框架。

EPInformer: a scalable deep learning framework for gene expression prediction by integrating promoter-enhancer sequences with multimodal epigenomic data.

作者信息

Lin Jiecong, Luo Ruibang, Pinello Luca

机构信息

Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Department of Pathology, Harvard Medical School, Boston, Massachusetts 02129, USA.

Department of Computer Science, The University of Hong Kong, Hong Kong, China.

出版信息

bioRxiv. 2024 Aug 1:2024.08.01.606099. doi: 10.1101/2024.08.01.606099.

DOI:10.1101/2024.08.01.606099
PMID:39131276
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11312614/
Abstract

Transcriptional regulation, critical for cellular differentiation and adaptation to environmental changes, involves coordinated interactions among DNA sequences, regulatory proteins, and chromatin architecture. Despite extensive data from consortia like ENCODE, understanding the dynamics of cis-regulatory elements (CREs) in gene expression remains challenging. Deep learning is a powerful tool for learning gene expression and epigenomic signals from DNA sequences, exhibiting superior performance compared to conventional machine learning approaches. However, even the most advanced deep learning-based methods may fall short in capturing the regulatory effects of distal elements such as enhancers, limiting their predictive accuracy. In addition, these methods may require significant resources to train or to adapt to newly generated data. To address these challenges, we present EPInformer, a scalable deep-learning framework for predicting gene expression by integrating promoter-enhancer interactions with their sequences, epigenomic signals, and chromatin contacts. Our model outperforms existing gene expression prediction models in rigorous cross-chromosome validation, accurately recapitulates enhancer-gene interactions validated by CRISPR perturbation experiments, and identifies crucial transcription factor motifs within regulatory sequences. EPInformer is available as open-source software at https://github.com/pinellolab/EPInformer.

摘要

转录调控对于细胞分化和适应环境变化至关重要,它涉及DNA序列、调控蛋白和染色质结构之间的协同相互作用。尽管来自ENCODE等联盟有大量数据,但了解顺式调控元件(CRE)在基因表达中的动态变化仍然具有挑战性。深度学习是一种从DNA序列中学习基因表达和表观基因组信号的强大工具,与传统机器学习方法相比表现出卓越性能。然而,即使是最先进的基于深度学习的方法在捕捉增强子等远端元件的调控作用时也可能不足,限制了它们的预测准确性。此外,这些方法可能需要大量资源来训练或适应新生成的数据。为应对这些挑战,我们提出了EPInformer,这是一个可扩展的深度学习框架,通过整合启动子-增强子相互作用及其序列、表观基因组信号和染色质接触来预测基因表达。在严格的跨染色体验证中,我们的模型优于现有的基因表达预测模型,准确地重现了经CRISPR干扰实验验证的增强子-基因相互作用,并识别出调控序列中的关键转录因子基序。EPInformer作为开源软件可在https://github.com/pinellolab/EPInformer获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0f7/11312614/d23a9a3f994e/nihpp-2024.08.01.606099v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0f7/11312614/b93afd9e3bec/nihpp-2024.08.01.606099v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0f7/11312614/92f76d33f1fa/nihpp-2024.08.01.606099v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0f7/11312614/853495979002/nihpp-2024.08.01.606099v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0f7/11312614/d23a9a3f994e/nihpp-2024.08.01.606099v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0f7/11312614/b93afd9e3bec/nihpp-2024.08.01.606099v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0f7/11312614/92f76d33f1fa/nihpp-2024.08.01.606099v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0f7/11312614/853495979002/nihpp-2024.08.01.606099v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0f7/11312614/d23a9a3f994e/nihpp-2024.08.01.606099v1-f0004.jpg

相似文献

1
EPInformer: a scalable deep learning framework for gene expression prediction by integrating promoter-enhancer sequences with multimodal epigenomic data.EPInformer:一种通过整合启动子-增强子序列与多组学表观基因组数据进行基因表达预测的可扩展深度学习框架。
bioRxiv. 2024 Aug 1:2024.08.01.606099. doi: 10.1101/2024.08.01.606099.
2
Integrating distal and proximal information to predict gene expression via a densely connected convolutional neural network.通过密集连接卷积神经网络整合远端和近端信息来预测基因表达。
Bioinformatics. 2020 Jan 15;36(2):496-503. doi: 10.1093/bioinformatics/btz562.
3
Enhancer-MDLF: a novel deep learning framework for identifying cell-specific enhancers.增强型 MDLF:一种用于识别细胞特异性增强子的新型深度学习框架。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae083.
4
Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions.打开黑箱:一种基于可解释深度神经网络的细胞类型特异性增强子预测分类器。
BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):54. doi: 10.1186/s12918-016-0302-3.
5
HEAP: a task adaptive-based explainable deep learning framework for enhancer activity prediction.HEAP:一种基于任务自适应的可解释深度学习框架,用于增强子活性预测。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad286.
6
iEnhancer-SKNN: a stacking ensemble learning-based method for enhancer identification and classification using sequence information.iEnhancer-SKNN:一种基于堆叠集成学习的方法,用于使用序列信息进行增强子识别和分类。
Brief Funct Genomics. 2023 May 18;22(3):302-311. doi: 10.1093/bfgp/elac057.
7
DeepCBA: A deep learning framework for gene expression prediction in maize based on DNA sequences and chromatin interactions.DeepCBA:基于 DNA 序列和染色质相互作用的玉米基因表达预测深度学习框架。
Plant Commun. 2024 Sep 9;5(9):100985. doi: 10.1016/j.xplc.2024.100985. Epub 2024 Jun 10.
8
EnContact: predicting enhancer-enhancer contacts using sequence-based deep learning model.EnContact:使用基于序列的深度学习模型预测增强子-增强子相互作用
PeerJ. 2019 Sep 13;7:e7657. doi: 10.7717/peerj.7657. eCollection 2019.
9
DeepCAPE: A Deep Convolutional Neural Network for the Accurate Prediction of Enhancers.深度CAPE:用于准确预测增强子的深度卷积神经网络
Genomics Proteomics Bioinformatics. 2021 Aug;19(4):565-577. doi: 10.1016/j.gpb.2019.04.006. Epub 2021 Feb 11.
10
Genome-wide identification and characterization of DNA enhancers with a stacked multivariate fusion framework.基于堆叠多元融合框架的全基因组 DNA 增强子识别与特征分析。
PLoS Comput Biol. 2022 Dec 15;18(12):e1010779. doi: 10.1371/journal.pcbi.1010779. eCollection 2022 Dec.

本文引用的文献

1
Interpreting -regulatory mechanisms from genomic deep neural networks using surrogate models.使用替代模型从基因组深度神经网络解释调控机制。
Nat Mach Intell. 2024 Jun;6(6):701-713. doi: 10.1038/s42256-024-00851-5. Epub 2024 Jun 21.
2
Massively parallel characterization of transcriptional regulatory elements.转录调控元件的大规模并行表征
Nature. 2025 Mar;639(8054):411-420. doi: 10.1038/s41586-024-08430-9. Epub 2025 Jan 15.
3
Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation.将DNA序列预测RNA测序覆盖度作为基因调控的统一模型。
Nat Genet. 2025 Apr;57(4):949-961. doi: 10.1038/s41588-024-02053-6. Epub 2025 Jan 8.
4
EpiGePT: a pretrained transformer-based language model for context-specific human epigenomics.EpiGePT:一种用于特定背景人类表观基因组学的基于预训练Transformer的语言模型。
Genome Biol. 2024 Dec 18;25(1):310. doi: 10.1186/s13059-024-03449-7.
5
Hold out the genome: a roadmap to solving the cis-regulatory code.伸出基因组:解决顺式调控代码的路线图。
Nature. 2024 Jan;625(7993):41-50. doi: 10.1038/s41586-023-06661-w. Epub 2023 Dec 13.
6
Current approaches to genomic deep learning struggle to fully capture human genetic variation.当前用于基因组深度学习的方法难以完全捕捉人类遗传变异。
Nat Genet. 2023 Dec;55(12):2021-2022. doi: 10.1038/s41588-023-01517-5.
7
JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles.JASPAR 2024:转录因子结合谱开放获取数据库的 20 周年纪念
Nucleic Acids Res. 2024 Jan 5;52(D1):D174-D182. doi: 10.1093/nar/gkad1059.
8
A generalizable framework to comprehensively predict epigenome, chromatin organization, and transcriptome.一个可推广的框架,全面预测表观基因组、染色质组织和转录组。
Nucleic Acids Res. 2023 Jul 7;51(12):5931-5947. doi: 10.1093/nar/gkad436.
9
Transfer learning identifies sequence determinants of cell-type specific regulatory element accessibility.迁移学习确定细胞类型特异性调控元件可及性的序列决定因素。
NAR Genom Bioinform. 2023 Mar 29;5(2):lqad026. doi: 10.1093/nargab/lqad026. eCollection 2023 Jun.
10
Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers.目前基于序列的模型可以捕捉启动子中的基因表达决定因素,但大多忽略了远端增强子。
Genome Biol. 2023 Mar 27;24(1):56. doi: 10.1186/s13059-023-02899-9.