Blatti Charles, Kazemian Majid, Wolfe Scot, Brodsky Michael, Sinha Saurabh
Department of Computer Science, University of Illinois, Urbana, IL 61801, USA.
National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA.
Nucleic Acids Res. 2015 Apr 30;43(8):3998-4012. doi: 10.1093/nar/gkv195. Epub 2015 Mar 19.
Characterization of cell type specific regulatory networks and elements is a major challenge in genomics, and emerging strategies frequently employ high-throughput genome-wide assays of transcription factor (TF) to DNA binding, histone modifications or chromatin state. However, these experiments remain too difficult/expensive for many laboratories to apply comprehensively to their system of interest. Here, we explore the potential of elucidating regulatory systems in varied cell types using computational techniques that rely on only data of gene expression, low-resolution chromatin accessibility, and TF-DNA binding specificities ('motifs'). We show that static computational motif scans overlaid with chromatin accessibility data reasonably approximate experimentally measured TF-DNA binding. We demonstrate that predicted binding profiles and expression patterns of hundreds of TFs are sufficient to identify major regulators of ∼200 spatiotemporal expression domains in the Drosophila embryo. We are then able to learn reliable statistical models of enhancer activity for over 70 expression domains and apply those models to annotate domain specific enhancers genome-wide. Throughout this work, we apply our motif and accessibility based approach to comprehensively characterize the regulatory network of fruitfly embryonic development and show that the accuracy of our computational method compares favorably to approaches that rely on data from many experimental assays.
细胞类型特异性调控网络和元件的表征是基因组学中的一项重大挑战,新兴策略经常采用转录因子(TF)与DNA结合、组蛋白修饰或染色质状态的全基因组高通量检测。然而,对于许多实验室来说,这些实验仍然过于困难/昂贵,无法全面应用于他们感兴趣的系统。在这里,我们探索利用仅依赖基因表达数据、低分辨率染色质可及性和TF-DNA结合特异性(“基序”)的计算技术来阐明不同细胞类型中调控系统的潜力。我们表明,叠加染色质可及性数据的静态计算基序扫描合理地近似于实验测量的TF-DNA结合。我们证明,数百个TF的预测结合谱和表达模式足以识别果蝇胚胎中约200个时空表达域的主要调节因子。然后,我们能够为70多个表达域建立可靠的增强子活性统计模型,并将这些模型应用于全基因组注释域特异性增强子。在整个这项工作中,我们应用基于基序和可及性的方法全面表征果蝇胚胎发育的调控网络,并表明我们的计算方法的准确性与依赖许多实验检测数据的方法相比具有优势。