Dzida Tomasz, Iqbal Mudassar, Charapitsa Iryna, Reid George, Stunnenberg Henk, Matarese Filomena, Grote Korbinian, Honkela Antti, Rattray Magnus
Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom.
Chemical Biology Core Facility, European Molecular Biology Laboratory, Heidelberg, Germany.
PeerJ. 2017 Sep 28;5:e3742. doi: 10.7717/peerj.3742. eCollection 2017.
We have developed a machine learning approach to predict stimulation-dependent enhancer-promoter interactions using evidence from changes in genomic protein occupancy over time. The occupancy of estrogen receptor alpha (ERα), RNA polymerase (Pol II) and histone marks H2AZ and H3K4me3 were measured over time using ChIP-Seq experiments in MCF7 cells stimulated with estrogen. A Bayesian classifier was developed which uses the correlation of temporal binding patterns at enhancers and promoters and genomic proximity as features to predict interactions. This method was trained using experimentally determined interactions from the same system and was shown to achieve much higher precision than predictions based on the genomic proximity of nearest ERα binding. We use the method to identify a genome-wide confident set of ERα target genes and their regulatory enhancers genome-wide. Validation with publicly available GRO-Seq data demonstrates that our predicted targets are much more likely to show early nascent transcription than predictions based on genomic ERα binding proximity alone.
我们开发了一种机器学习方法,利用基因组蛋白占有率随时间变化的证据来预测刺激依赖性增强子-启动子相互作用。在雌激素刺激的MCF7细胞中,通过ChIP-Seq实验随时间测量雌激素受体α(ERα)、RNA聚合酶(Pol II)以及组蛋白标记H2AZ和H3K4me3的占有率。开发了一种贝叶斯分类器,该分类器利用增强子和启动子处的时间结合模式的相关性以及基因组邻近性作为特征来预测相互作用。该方法使用来自同一系统的实验确定的相互作用进行训练,结果表明其精度远高于基于最近ERα结合的基因组邻近性的预测。我们使用该方法在全基因组范围内鉴定出一组可靠的ERα靶基因及其调控增强子。用公开可用的GRO-Seq数据进行验证表明,与仅基于基因组ERα结合邻近性的预测相比,我们预测的靶标更有可能显示早期新生转录。