Philippakis Anthony A, Busser Brian W, Gisselbrecht Stephen S, He Fangxue Sherry, Estrada Beatriz, Michelson Alan M, Bulyk Martha L
Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA.
PLoS Comput Biol. 2006 May;2(5):e53. doi: 10.1371/journal.pcbi.0020053. Epub 2006 May 26.
While combinatorial models of transcriptional regulation can be inferred for metazoan systems from a priori biological knowledge, validation requires extensive and time-consuming experimental work. Thus, there is a need for computational methods that can evaluate hypothesized cis regulatory codes before the difficult task of experimental verification is undertaken. We have developed a novel computational framework (termed "CodeFinder") that integrates transcription factor binding site and gene expression information to evaluate whether a hypothesized transcriptional regulatory model (TRM; i.e., a set of co-regulating transcription factors) is likely to target a given set of co-expressed genes. Our basic approach is to simultaneously predict cis regulatory modules (CRMs) associated with a given gene set and quantify the enrichment for combinatorial subsets of transcription factor binding site motifs comprising the hypothesized TRM within these predicted CRMs. As a model system, we have examined a TRM experimentally demonstrated to drive the expression of two genes in a sub-population of cells in the developing Drosophila mesoderm, the somatic muscle founder cells. This TRM was previously hypothesized to be a general mode of regulation for genes expressed in this cell population. In contrast, the present analyses suggest that a modified form of this cis regulatory code applies to only a subset of founder cell genes, those whose gene expression responds to specific genetic perturbations in a similar manner to the gene on which the original model was based. We have confirmed this hypothesis by experimentally discovering six (out of 12 tested) new CRMs driving expression in the embryonic mesoderm, four of which drive expression in founder cells.
虽然可以从先验生物学知识推断后生动物系统的转录调控组合模型,但验证需要大量且耗时的实验工作。因此,在进行困难的实验验证任务之前,需要能够评估假设的顺式调控代码的计算方法。我们开发了一种新颖的计算框架(称为“CodeFinder”),该框架整合了转录因子结合位点和基因表达信息,以评估假设的转录调控模型(TRM;即一组共同调控的转录因子)是否可能靶向一组给定的共表达基因。我们的基本方法是同时预测与给定基因集相关的顺式调控模块(CRM),并量化在这些预测的CRM中包含假设的TRM的转录因子结合位点基序组合子集的富集情况。作为一个模型系统,我们研究了一个TRM,该TRM已通过实验证明可驱动发育中的果蝇中胚层(即体细胞肌肉祖细胞)亚群中的两个基因的表达。该TRM先前被假设为该细胞群体中表达的基因的一般调控模式。相比之下,目前的分析表明,这种顺式调控代码的一种修改形式仅适用于祖细胞基因的一个子集,即那些基因表达以与原始模型所基于的基因类似的方式响应特定基因扰动的基因。我们通过实验发现了六个(在测试的12个中)在胚胎中胚层中驱动表达的新CRM,其中四个在祖细胞中驱动表达,从而证实了这一假设。