Gurdziel Katherine, Vogt Kyle R, Schneider Gary, Richards Neil, Gumucio Deborah L
Department of Cell and Developmental Biology, University of Michigan, Ann Arbor, MI, 48109, USA.
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA.
BMC Dev Biol. 2016 Feb 24;16:4. doi: 10.1186/s12861-016-0106-0.
The Hedgehog (Hh) signaling pathway, acting through three homologous transcription factors (GLI1, GLI2, GLI3) in vertebrates, plays multiple roles in embryonic organ development and adult tissue homeostasis. At the level of the genome, GLI factors bind to specific motifs in enhancers, some of which are hundreds of kilobases removed from the gene promoter. These enhancers integrate the Hh signal in a context-specific manner to control the spatiotemporal pattern of target gene expression. Importantly, a number of genes that encode Hh pathway molecules are themselves targets of Hh signaling, allowing pathway regulation by an intricate balance of feed-back activation and inhibition. However, surprisingly few of the critical enhancer elements that control these pathway target genes have been identified despite the fact that such elements are central determinants of Hh signaling activity. Recently, ChIP studies have been carried out in multiple tissue contexts using mouse models carrying FLAG-tagged GLI proteins (GLI(FLAG)). Using these datasets, we tested whether a meta-analysis of GLI binding sites, coupled with a machine learning approach, could reveal genomic features that could be used to empirically identify Hh-regulated enhancers linked to loci of the Hh signaling pathway.
A meta-analysis of four existing GLI(FLAG) datasets revealed a library of GLI binding motifs that was substantially more restricted than the potential sites predicted by previous in vitro binding studies. A machine learning method (kmer-SVM) was then applied to these datasets and enriched k-mers were identified that, when applied to the mouse genome, predicted as many as 37,000 potential Hh enhancers. For functional analysis, we selected nine regions which were annotated to putative Hh pathway molecules and found that seven exhibited GLI-dependent activity, indicating that they are directly regulated by Hh signaling (78% success rate).
The results suggest that Hh enhancer regions share common sequence features. The kmer-SVM machine learning approach identifies those features and can successfully predict functional Hh regulatory regions in genomic DNA surrounding Hh pathway molecules and likely, other Hh targets. Additionally, the library of enriched GLI binding motifs that we have identified may allow improved identification of functional GLI binding sites.
刺猬信号通路(Hh)通过脊椎动物中的三种同源转录因子(GLI1、GLI2、GLI3)发挥作用,在胚胎器官发育和成年组织稳态中扮演多种角色。在基因组水平上,GLI因子与增强子中的特定基序结合,其中一些增强子距离基因启动子有数百千碱基。这些增强子以特定于上下文的方式整合Hh信号,以控制靶基因表达的时空模式。重要的是,许多编码Hh通路分子的基因本身就是Hh信号的靶标,通过复杂的反馈激活和抑制平衡实现通路调节。然而,尽管这些元件是Hh信号活性的核心决定因素,但令人惊讶的是,控制这些通路靶基因的关键增强子元件却很少被鉴定出来。最近,使用携带FLAG标签的GLI蛋白(GLI(FLAG))的小鼠模型,在多种组织背景下进行了染色质免疫沉淀(ChIP)研究。利用这些数据集,我们测试了对GLI结合位点的荟萃分析,结合机器学习方法,是否能够揭示可用于凭经验识别与Hh信号通路基因座相关的Hh调控增强子的基因组特征。
对四个现有的GLI(FLAG)数据集进行的荟萃分析揭示了一个GLI结合基序库,该库比先前体外结合研究所预测的潜在位点受到的限制要大得多。然后将一种机器学习方法(kmer-SVM)应用于这些数据集,并鉴定出富集的k-mer,当将其应用于小鼠基因组时,可预测多达37000个潜在的Hh增强子。为了进行功能分析,我们选择了九个注释为假定的Hh通路分子的区域,发现其中七个表现出GLI依赖性活性,表明它们直接受Hh信号调控(成功率为78%)。
结果表明Hh增强子区域具有共同的序列特征。kmer-SVM机器学习方法识别出这些特征,并能够成功预测Hh通路分子周围基因组DNA中功能性Hh调控区域,可能还有其他Hh靶标。此外,我们鉴定出的富集GLI结合基序库可能有助于改进对功能性GLI结合位点的识别。