使用 DNase-seq 在中进行全基因组活性调控元件和转录因子足迹的发现。

Genome-wide discovery of active regulatory elements and transcription factor footprints in using DNase-seq.

机构信息

Division of Biology and Bioengineering, Howard Hughes Medical Institute, California Institute of Technology, Pasadena, California 91125, USA.

出版信息

Genome Res. 2017 Dec;27(12):2108-2119. doi: 10.1101/gr.223735.117. Epub 2017 Oct 26.

Deep sequencing of size-selected DNase I-treated chromatin (DNase-seq) allows high-resolution measurement of chromatin accessibility to DNase I cleavage, permitting identification of de novo active -regulatory modules (CRMs) and individual transcription factor (TF) binding sites. We adapted DNase-seq to nuclei isolated from embryos and L1 arrest larvae to generate high-resolution maps of TF binding. Over half of embryonic DNase I hypersensitive sites (DHSs) were annotated as noncoding, with 24% in intergenic, 12% in promoters, and 28% in introns, with similar statistics observed in L1 arrest larvae. Noncoding DHSs are highly conserved and enriched in marks of enhancer activity and transcription. We validated noncoding DHSs against known enhancers from , , and and recapitulated 15 of 17 known enhancers. We then mined DNase-seq data to identify putative active CRMs and TF footprints. Using DNase-seq data improved predictions of tissue-specific expression compared with motifs alone. In a pilot functional test, 10 of 15 DHSs from , , and drove reporter gene expression in transgenic Overall, we provide experimental annotation of 26,644 putative CRMs in the embryo containing 55,890 TF footprints, as well as 15,841 putative CRMs in the L1 arrest larvae containing 32,685 TF footprints.

对经大小选择的 DNA 酶 I 处理的染色质进行深度测序（DNase-seq）可实现对 DNA 酶 I 切割的染色质可及性的高分辨率测量，从而能够鉴定新的活性调控模块（CRMs）和单个转录因子（TF）结合位点。我们将 DNase-seq 方法应用于从胚胎和 L1 停滞幼虫中分离的核，以生成 TF 结合的高分辨率图谱。超过一半的胚胎 DNA 酶 I 超敏位点（DHSs）被注释为非编码，其中 24%位于基因间区，12%位于启动子区，28%位于内含子区，在 L1 停滞幼虫中观察到类似的统计数据。非编码 DHSs 高度保守，富集了增强子活性和转录的标记。我们通过已知的、、和的增强子对非编码 DHSs 进行了验证，并重现了 17 个已知增强子中的 15 个。然后，我们通过挖掘 DNase-seq 数据来识别潜在的活性 CRM 和 TF 足迹。与仅使用基序相比，使用 DNase-seq 数据可以提高组织特异性表达的预测。在一个初步的功能测试中，、、和中的 15 个 DHSs 中的 10 个在转基因中驱动报告基因表达。总的来说，我们在胚胎中提供了 26644 个潜在 CRM 的实验注释，其中包含 55890 个 TF 足迹，在 L1 停滞幼虫中提供了 15841 个潜在 CRM 的实验注释，其中包含 32685 个 TF 足迹。