打开黑箱：一种基于可解释深度神经网络的细胞类型特异性增强子预测分类器。

Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions.

作者信息

Kim Seong Gon, Theera-Ampornpunt Nawanol, Fang Chih-Hao, Harwani Mrudul, Grama Ananth, Chaterji Somali

机构信息

Department of Computer Science, Purdue University, West Lafayette, IN, USA.

出版信息

BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):54. doi: 10.1186/s12918-016-0302-3.

DOI:10.1186/s12918-016-0302-3

PMID:27490187

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4977478/

Abstract

BACKGROUND

Gene expression is mediated by specialized cis-regulatory modules (CRMs), the most prominent of which are called enhancers. Early experiments indicated that enhancers located far from the gene promoters are often responsible for mediating gene transcription. Knowing their properties, regulatory activity, and genomic targets is crucial to the functional understanding of cellular events, ranging from cellular homeostasis to differentiation. Recent genome-wide investigation of epigenomic marks has indicated that enhancer elements could be enriched for certain epigenomic marks, such as, combinatorial patterns of histone modifications.

METHODS

Our efforts in this paper are motivated by these recent advances in epigenomic profiling methods, which have uncovered enhancer-associated chromatin features in different cell types and organisms. Specifically, in this paper, we use recent state-of-the-art Deep Learning methods and develop a deep neural network (DNN)-based architecture, called EP-DNN, to predict the presence and types of enhancers in the human genome. It uses as features, the expression levels of the histone modifications at the peaks of the functional sites as well as in its adjacent regions. We apply EP-DNN to four different cell types: H1, IMR90, HepG2, and HeLa S3. We train EP-DNN using p300 binding sites as enhancers, and TSS and random non-DHS sites as non-enhancers. We perform EP-DNN predictions to quantify the validation rate for different levels of confidence in the predictions and also perform comparisons against two state-of-the-art computational models for enhancer predictions, DEEP-ENCODE and RFECS.

RESULTS

We find that EP-DNN has superior accuracy and takes less time to make predictions. Next, we develop methods to make EP-DNN interpretable by computing the importance of each input feature in the classification task. This analysis indicates that the important histone modifications were distinct for different cell types, with some overlaps, e.g., H3K27ac was important in cell type H1 but less so in HeLa S3, while H3K4me1 was relatively important in all four cell types. We finally use the feature importance analysis to reduce the number of input features needed to train the DNN, thus reducing training time, which is often the computational bottleneck in the use of a DNN.

CONCLUSIONS

In this paper, we developed EP-DNN, which has high accuracy of prediction, with validation rates above 90 % for the operational region of enhancer prediction for all four cell lines that we studied, outperforming DEEP-ENCODE and RFECS. Then, we developed a method to analyze a trained DNN and determine which histone modifications are important, and within that, which features proximal or distal to the enhancer site, are important.

摘要

背景

基因表达由专门的顺式调控模块（CRM）介导，其中最突出的称为增强子。早期实验表明，位于远离基因启动子的增强子通常负责介导基因转录。了解它们的特性、调控活性和基因组靶点对于从细胞稳态到分化的细胞事件的功能理解至关重要。最近对表观基因组标记的全基因组研究表明，增强子元件可能富含某些表观基因组标记，例如组蛋白修饰的组合模式。

方法

本文的研究工作受到表观基因组分析方法的最新进展的推动，这些进展揭示了不同细胞类型和生物体中与增强子相关的染色质特征。具体而言，在本文中，我们使用最新的深度学习方法，开发了一种基于深度神经网络（DNN）的架构，称为EP-DNN，用于预测人类基因组中增强子的存在和类型。它将功能位点峰值及其相邻区域的组蛋白修饰表达水平用作特征。我们将EP-DNN应用于四种不同的细胞类型：H1、IMR90、HepG2和HeLa S3。我们使用p300结合位点作为增强子，转录起始位点（TSS）和随机非 DHS 位点作为非增强子来训练EP-DNN。我们进行EP-DNN预测，以量化不同置信水平下预测的验证率，并与两种用于增强子预测的先进计算模型DEEP-ENCODE和RFECS进行比较。

结果

我们发现EP-DNN具有更高的准确性，并且进行预测所需的时间更少。接下来，我们通过计算分类任务中每个输入特征的重要性来开发使EP-DNN可解释的方法。该分析表明，重要的组蛋白修饰在不同细胞类型中各不相同，但存在一些重叠，例如，H3K27ac在H1细胞类型中很重要，但在HeLa S3中不太重要，而H3K4me1在所有四种细胞类型中相对都很重要。我们最终使用特征重要性分析来减少训练DNN所需的输入特征数量，从而减少训练时间，而训练时间通常是使用DNN时的计算瓶颈。

结论

在本文中，我们开发了EP-DNN，其预测准确性高，对于我们研究的所有四种细胞系，在增强子预测的操作区域内验证率均高于90%，优于DEEP-ENCODE和RFECS。然后，我们开发了一种方法来分析训练后的DNN，并确定哪些组蛋白修饰是重要的，以及在这些修饰中，增强子位点近端或远端的哪些特征是重要的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d0c/4977478/9e13d633adf1/12918_2016_302_Fig1_HTML.jpg

相似文献

Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions.

BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):54. doi: 10.1186/s12918-016-0302-3.

EP-DNN: A Deep Neural Network-Based Global Enhancer Prediction Algorithm.

Sci Rep. 2016 Dec 8;6:38433. doi: 10.1038/srep38433.

Enhancer prediction with histone modification marks using a hybrid neural network model.

Methods. 2019 Aug 15;166:48-56. doi: 10.1016/j.ymeth.2019.03.014. Epub 2019 Mar 21.

AIKYATAN: mapping distal regulatory elements using convolutional learning on GPU.

BMC Bioinformatics. 2019 Oct 7;20(1):488. doi: 10.1186/s12859-019-3049-1.

Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns.

BMC Bioinformatics. 2020 Jul 20;21(1):317. doi: 10.1186/s12859-020-03621-3.

RFECS: a random-forest based algorithm for enhancer identification from chromatin state.

PLoS Comput Biol. 2013;9(3):e1002968. doi: 10.1371/journal.pcbi.1002968. Epub 2013 Mar 14.

Enhancer identification in mouse embryonic stem cells using integrative modeling of chromatin and genomic features.

BMC Genomics. 2012 Apr 26;13:152. doi: 10.1186/1471-2164-13-152.

Peak-valley-peak pattern of histone modifications delineates active regulatory elements and their directionality.

Nucleic Acids Res. 2016 May 19;44(9):4037-51. doi: 10.1093/nar/gkw250. Epub 2016 Apr 19.

High-throughput functional testing of ENCODE segmentation predictions.

Genome Res. 2014 Oct;24(10):1595-602. doi: 10.1101/gr.173518.114. Epub 2014 Jul 17.

Integrating distal and proximal information to predict gene expression via a densely connected convolutional neural network.

Bioinformatics. 2020 Jan 15;36(2):496-503. doi: 10.1093/bioinformatics/btz562.

引用本文的文献

Unbox the black-box for the medical explainable AI via multi-modal and multi-centre data fusion: A mini-review, two showcases and beyond.

Inf Fusion. 2022 Jan;77:29-52. doi: 10.1016/j.inffus.2021.07.016.

Machine Learning in Epigenomics: Insights into Cancer Biology and Medicine.

Biochim Biophys Acta Rev Cancer. 2021 Dec;1876(2):188588. doi: 10.1016/j.bbcan.2021.188588. Epub 2021 Jul 7.

Explainable artificial intelligence models using real-world electronic health record data: a systematic scoping review.

J Am Med Inform Assoc. 2020 Jul 1;27(7):1173-1185. doi: 10.1093/jamia/ocaa053.

AIKYATAN: mapping distal regulatory elements using convolutional learning on GPU.

BMC Bioinformatics. 2019 Oct 7;20(1):488. doi: 10.1186/s12859-019-3049-1.

CRISPR Genome Engineering for Human Pluripotent Stem Cell Research.

Theranostics. 2017 Oct 7;7(18):4445-4469. doi: 10.7150/thno.18456. eCollection 2017.

本文引用的文献

Epigenomics: Roadmap for regulation.

Nature. 2015 Feb 19;518(7539):314-6. doi: 10.1038/518314a.

DEEP: a general computational framework for predicting enhancers.

Nucleic Acids Res. 2015 Jan;43(1):e6. doi: 10.1093/nar/gku1058. Epub 2014 Nov 5.

Dissection of thousands of cell type-specific enhancers identifies dinucleotide repeat motifs as general enhancer features.

Genome Res. 2014 Jul;24(7):1147-56. doi: 10.1101/gr.169243.113. Epub 2014 Apr 8.

Looping back to leap forward: transcription enters a new era.

Cell. 2014 Mar 27;157(1):13-25. doi: 10.1016/j.cell.2014.02.009.

A promoter-level mammalian expression atlas.

Nature. 2014 Mar 27;507(7493):462-70. doi: 10.1038/nature13182.

Transcriptional enhancers: from properties to genome-wide predictions.

Nat Rev Genet. 2014 Apr;15(4):272-86. doi: 10.1038/nrg3682. Epub 2014 Mar 11.

RFECS: a random-forest based algorithm for enhancer identification from chromatin state.

PLoS Comput Biol. 2013;9(3):e1002968. doi: 10.1371/journal.pcbi.1002968. Epub 2013 Mar 14.

Enhancers: five essential questions.

Nat Rev Genet. 2013 Apr;14(4):288-95. doi: 10.1038/nrg3458.

Modification of enhancer chromatin: what, how, and why?

Mol Cell. 2013 Mar 7;49(5):825-37. doi: 10.1016/j.molcel.2013.01.038.

An integrated encyclopedia of DNA elements in the human genome.

Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

打开黑箱：一种基于可解释深度神经网络的细胞类型特异性增强子预测分类器。

Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献