Suppr超能文献

EPInformer:一种通过整合启动子-增强子序列与多组学表观基因组数据进行基因表达预测的可扩展深度学习框架。

EPInformer: a scalable deep learning framework for gene expression prediction by integrating promoter-enhancer sequences with multimodal epigenomic data.

作者信息

Lin Jiecong, Luo Ruibang, Pinello Luca

机构信息

Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Department of Pathology, Harvard Medical School, Boston, Massachusetts 02129, USA.

Department of Computer Science, The University of Hong Kong, Hong Kong, China.

出版信息

bioRxiv. 2024 Aug 1:2024.08.01.606099. doi: 10.1101/2024.08.01.606099.

Abstract

Transcriptional regulation, critical for cellular differentiation and adaptation to environmental changes, involves coordinated interactions among DNA sequences, regulatory proteins, and chromatin architecture. Despite extensive data from consortia like ENCODE, understanding the dynamics of cis-regulatory elements (CREs) in gene expression remains challenging. Deep learning is a powerful tool for learning gene expression and epigenomic signals from DNA sequences, exhibiting superior performance compared to conventional machine learning approaches. However, even the most advanced deep learning-based methods may fall short in capturing the regulatory effects of distal elements such as enhancers, limiting their predictive accuracy. In addition, these methods may require significant resources to train or to adapt to newly generated data. To address these challenges, we present EPInformer, a scalable deep-learning framework for predicting gene expression by integrating promoter-enhancer interactions with their sequences, epigenomic signals, and chromatin contacts. Our model outperforms existing gene expression prediction models in rigorous cross-chromosome validation, accurately recapitulates enhancer-gene interactions validated by CRISPR perturbation experiments, and identifies crucial transcription factor motifs within regulatory sequences. EPInformer is available as open-source software at https://github.com/pinellolab/EPInformer.

摘要

转录调控对于细胞分化和适应环境变化至关重要,它涉及DNA序列、调控蛋白和染色质结构之间的协同相互作用。尽管来自ENCODE等联盟有大量数据,但了解顺式调控元件(CRE)在基因表达中的动态变化仍然具有挑战性。深度学习是一种从DNA序列中学习基因表达和表观基因组信号的强大工具,与传统机器学习方法相比表现出卓越性能。然而,即使是最先进的基于深度学习的方法在捕捉增强子等远端元件的调控作用时也可能不足,限制了它们的预测准确性。此外,这些方法可能需要大量资源来训练或适应新生成的数据。为应对这些挑战,我们提出了EPInformer,这是一个可扩展的深度学习框架,通过整合启动子-增强子相互作用及其序列、表观基因组信号和染色质接触来预测基因表达。在严格的跨染色体验证中,我们的模型优于现有的基因表达预测模型,准确地重现了经CRISPR干扰实验验证的增强子-基因相互作用,并识别出调控序列中的关键转录因子基序。EPInformer作为开源软件可在https://github.com/pinellolab/EPInformer获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0f7/11312614/b93afd9e3bec/nihpp-2024.08.01.606099v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验