DECODE：一种利用大规模功能测定法浓缩增强子并精调边界的深度学习框架。

DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays.

机构信息

Department of Statistics & Data Science, Yale University, New Haven, CT 06520, USA.

Department of Computer Science, University of California, Irvine, CA 92617, USA.

出版信息

Bioinformatics. 2021 Jul 12;37(Suppl_1):i280-i288. doi: 10.1093/bioinformatics/btab283.

DOI:10.1093/bioinformatics/btab283

PMID:34252960

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8275369/

Abstract

MOTIVATION

Mapping distal regulatory elements, such as enhancers, is a cornerstone for elucidating how genetic variations may influence diseases. Previous enhancer-prediction methods have used either unsupervised approaches or supervised methods with limited training data. Moreover, past approaches have implemented enhancer discovery as a binary classification problem without accurate boundary detection, producing low-resolution annotations with superfluous regions and reducing the statistical power for downstream analyses (e.g. causal variant mapping and functional validations). Here, we addressed these challenges via a two-step model called Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays (DECODE). First, we employed direct enhancer-activity readouts from novel functional characterization assays, such as STARR-seq, to train a deep neural network for accurate cell-type-specific enhancer prediction. Second, to improve the annotation resolution, we implemented a weakly supervised object detection framework for enhancer localization with precise boundary detection (to a 10 bp resolution) using Gradient-weighted Class Activation Mapping.

RESULTS

Our DECODE binary classifier outperformed a state-of-the-art enhancer prediction method by 24% in transgenic mouse validation. Furthermore, the object detection framework can condense enhancer annotations to only 13% of their original size, and these compact annotations have significantly higher conservation scores and genome-wide association study variant enrichments than the original predictions. Overall, DECODE is an effective tool for enhancer classification and precise localization.

AVAILABILITY AND IMPLEMENTATION

DECODE source code and pre-processing scripts are available at decode.gersteinlab.org.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

绘制远端调控元件，如增强子，是阐明遗传变异如何影响疾病的基石。以前的增强子预测方法要么使用无监督方法，要么使用有限训练数据的监督方法。此外，过去的方法将增强子发现实现为一个没有准确边界检测的二进制分类问题，产生具有多余区域的低分辨率注释，并降低下游分析（例如因果变异映射和功能验证）的统计能力。在这里，我们通过一个两步模型来解决这些挑战，该模型称为利用大规模功能测定进行浓缩增强子和细化边界的深度学习框架（DECODE）。首先，我们利用新型功能特征测定试验（如 STARR-seq）中的直接增强子活性读数，来训练一个深度神经网络，以进行准确的细胞类型特异性增强子预测。其次，为了提高注释分辨率，我们实施了一个弱监督的目标检测框架，使用梯度加权类激活映射进行精确边界检测（分辨率为 10 bp），以实现增强子定位。

结果

我们的 DECODE 二进制分类器在转基因小鼠验证中比最先进的增强子预测方法高出 24%。此外，目标检测框架可以将增强子注释压缩到原始大小的 13%，并且这些紧凑的注释比原始预测具有更高的保守评分和全基因组关联研究变异富集。总的来说，DECODE 是一种有效的增强子分类和精确定位工具。

可用性和实现

DECODE 的源代码和预处理脚本可在 decode.gersteinlab.org 获得。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8d5/8275369/53e2a8eded9e/btab283f1.jpg

相似文献

DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays.

Bioinformatics. 2021 Jul 12;37(Suppl_1):i280-i288. doi: 10.1093/bioinformatics/btab283.

Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions.

BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):54. doi: 10.1186/s12918-016-0302-3.

AIKYATAN: mapping distal regulatory elements using convolutional learning on GPU.

BMC Bioinformatics. 2019 Oct 7;20(1):488. doi: 10.1186/s12859-019-3049-1.

InsuLock: A Weakly Supervised Learning Approach for Accurate Insulator Prediction, and Variant Impact Quantification.

Genes (Basel). 2022 Mar 30;13(4):621. doi: 10.3390/genes13040621.

SeqEnhDL: sequence-based classification of cell type-specific enhancers using deep learning models.

BMC Res Notes. 2021 Mar 19;14(1):104. doi: 10.1186/s13104-021-05518-7.

Integrating distal and proximal information to predict gene expression via a densely connected convolutional neural network.

Bioinformatics. 2020 Jan 15;36(2):496-503. doi: 10.1093/bioinformatics/btz562.

Enhancer prediction with histone modification marks using a hybrid neural network model.

Methods. 2019 Aug 15;166:48-56. doi: 10.1016/j.ymeth.2019.03.014. Epub 2019 Mar 21.

Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns.

BMC Bioinformatics. 2020 Jul 20;21(1):317. doi: 10.1186/s12859-020-03621-3.

BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone.

Bioinformatics. 2017 Jul 1;33(13):1930-1936. doi: 10.1093/bioinformatics/btx105.

iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength.

Int J Mol Sci. 2021 Mar 30;22(7):3589. doi: 10.3390/ijms22073589.

引用本文的文献

A map of enhancer regions in primary human neural progenitor cells using capture STARR-seq.

Genome Res. 2025 Aug 1;35(8):1887-1901. doi: 10.1101/gr.279584.124.

A deep learning model for DNA enhancer prediction based on nucleotide position aware feature encoding.

iScience. 2024 May 19;27(6):110030. doi: 10.1016/j.isci.2024.110030. eCollection 2024 Jun 21.

Validation of Enhancer Regions in Primary Human Neural Progenitor Cells using Capture STARR-seq.

bioRxiv. 2024 Mar 18:2024.03.14.585066. doi: 10.1101/2024.03.14.585066.

Pig-eRNAdb: a comprehensive enhancer and eRNA dataset of pigs.

Sci Data. 2024 Feb 1;11(1):157. doi: 10.1038/s41597-024-02960-7.

Integrative approaches based on genomic techniques in the functional studies on enhancers.

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad442.

Deep learning predicts the impact of regulatory variants on cell-type-specific enhancers in the brain.

Bioinform Adv. 2023 Jan 12;3(1):vbad002. doi: 10.1093/bioadv/vbad002. eCollection 2023.

scEpiLock: A Weakly Supervised Learning Framework for -Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data.

Biomolecules. 2022 Jun 23;12(7):874. doi: 10.3390/biom12070874.

DIRECT-NET: An efficient method to discover cis-regulatory elements and construct regulatory networks from single-cell multiomics data.

Sci Adv. 2022 Jun 3;8(22):eabl7393. doi: 10.1126/sciadv.abl7393. Epub 2022 Jun 1.

本文引用的文献

DeepCAPE: A Deep Convolutional Neural Network for the Accurate Prediction of Enhancers.

Genomics Proteomics Bioinformatics. 2021 Aug;19(4):565-577. doi: 10.1016/j.gpb.2019.04.006. Epub 2021 Feb 11.

STARRPeaker: uniform processing and accurate identification of STARR-seq active regions.

Genome Biol. 2020 Dec 8;21(1):298. doi: 10.1186/s13059-020-02194-x.

A pitfall for machine learning methods aiming to predict across cell types.

Genome Biol. 2020 Nov 19;21(1):282. doi: 10.1186/s13059-020-02177-y.

Predicting unrecognized enhancer-mediated genome topology by an ensemble machine learning model.

Genome Res. 2020 Dec;30(12):1835-1845. doi: 10.1101/gr.264606.120. Epub 2020 Nov 12.

Supervised enhancer prediction with epigenetic pattern recognition and targeted validation.

Nat Methods. 2020 Aug;17(8):807-814. doi: 10.1038/s41592-020-0907-8. Epub 2020 Jul 29.

Expanded encyclopaedias of DNA elements in the human and mouse genomes.

Nature. 2020 Jul;583(7818):699-710. doi: 10.1038/s41586-020-2493-4. Epub 2020 Jul 29.

An integrative ENCODE resource for cancer genomics.

Nat Commun. 2020 Jul 29;11(1):3696. doi: 10.1038/s41467-020-14743-w.

RADAR: annotation and prioritization of variants in the post-transcriptional regulome of RNA-binding proteins.

Genome Biol. 2020 Jul 30;21(1):151. doi: 10.1186/s13059-020-01979-4.

The mutational constraint spectrum quantified from variation in 141,456 humans.

Nature. 2020 May;581(7809):434-443. doi: 10.1038/s41586-020-2308-7. Epub 2020 May 27.

Modeling Psychiatric Disorder Biology with Stem Cells.

Curr Psychiatry Rep. 2020 Apr 21;22(5):24. doi: 10.1007/s11920-020-01148-1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

DECODE：一种利用大规模功能测定法浓缩增强子并精调边界的深度学习框架。

DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays.

机构信息

Department of Statistics & Data Science, Yale University, New Haven, CT 06520, USA.

Department of Computer Science, University of California, Irvine, CA 92617, USA.