Mortazavi Ali, Leeper Thompson Evonne Chen, Garcia Sarah T, Myers Richard M, Wold Barbara
Division of Biology, California Institute of Technology, Pasadena, California 91125, USA.
Genome Res. 2006 Oct;16(10):1208-21. doi: 10.1101/gr.4997306. Epub 2006 Sep 8.
We constructed and applied an open source informatic framework called Cistematic in an effort to predict the target gene repertoire for transcription factors with large binding sites. Cistematic uses two different evolutionary conservation-filtering algorithms in conjunction with several analysis modules. Beginning with a single conserved and biologically tested site for the neuronal repressor NRSF/REST, Cistematic generated a refined PSFM (position specific frequency matrix) based on conserved site occurrences in mouse, human, and dog genomes. Predictions from this model were validated by chromatin immunoprecipitation (ChIP) followed by quantitative PCR. The combination of transfection assays and ChIP enrichment data provided an objective basis for setting a threshold for membership and rank-ordering a final gene cohort model consisting of 842 high-confidence sites in the human genome associated with 733 genes. Statistically significant enrichment of NRSE-associated genes was found for neuron-specific Gene Ontology (GO) terms and neuronal mRNA expression profiles. A more extensive evolutionary survey showed that NRSE sites matching the PSFM model exist in roughly similar numbers in all fully sequenced vertebrate genomes but are notably absent from invertebrate and protochordate genomes, as is NRSF itself. Some NRSF/REST sites reside in repeats, which suggests a mechanism for both ancient and modern dispersal of NRSEs through vertebrate genomes. Multiple predicted sites are located near neuronal microRNA and splicing-factor genes, and these tested positive for NRSF/REST occupancy in vivo. The resulting network model integrates post-transcriptional and translational controllers, including candidate feedback loops on NRSF and its corepressor, CoREST.
我们构建并应用了一个名为Cistematic的开源信息框架,旨在预测具有大结合位点的转录因子的靶基因库。Cistematic结合使用两种不同的进化保守性过滤算法以及多个分析模块。从神经元抑制因子NRSF/REST的单个保守且经过生物学测试的位点开始,Cistematic基于小鼠、人类和犬类基因组中保守位点的出现情况生成了一个优化的位置特异性频率矩阵(PSFM)。该模型的预测通过染色质免疫沉淀(ChIP)随后进行定量PCR来验证。转染实验和ChIP富集数据的结合为设定成员阈值和对由人类基因组中与733个基因相关的842个高置信度位点组成的最终基因队列模型进行排名提供了客观依据。在神经元特异性基因本体(GO)术语和神经元mRNA表达谱中发现了与NRSE相关基因的统计学显著富集。更广泛的进化调查表明,与PSFM模型匹配的NRSE位点在所有已完全测序的脊椎动物基因组中的数量大致相似,但在无脊椎动物和原索动物基因组中明显不存在,NRSF本身也是如此。一些NRSF/REST位点位于重复序列中,这表明NRSEs在脊椎动物基因组中进行古代和现代扩散的一种机制。多个预测位点位于神经元微小RNA和剪接因子基因附近,并且这些位点在体内对NRSF/REST的占据测试呈阳性。由此产生的网络模型整合了转录后和翻译控制器,包括NRSF及其共抑制因子CoREST上的候选反馈环。