Wang Hao, Zhang Ying, Cheng Yong, Zhou Yuepin, King David C, Taylor James, Chiaromonte Francesca, Kasturi Jyotsna, Petrykowska Hanna, Gibb Brian, Dorman Christine, Miller Webb, Dore Louis C, Welch John, Weiss Mitchell J, Hardison Ross C
Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.
Genome Res. 2006 Dec;16(12):1480-92. doi: 10.1101/gr.5353806. Epub 2006 Oct 12.
Multiple alignments of genome sequences are helpful guides to functional analysis, but predicting cis-regulatory modules (CRMs) accurately from such alignments remains an elusive goal. We predict CRMs for mammalian genes expressed in red blood cells by combining two properties gleaned from aligned, noncoding genome sequences: a positive regulatory potential (RP) score, which detects similarity to patterns in alignments distinctive for regulatory regions, and conservation of a binding site motif for the essential erythroid transcription factor GATA-1. Within eight target loci, we tested 75 noncoding segments by reporter gene assays in transiently transfected human K562 cells and/or after site-directed integration into murine erythroleukemia cells. Segments with a high RP score and a conserved exact match to the binding site consensus are validated at a good rate (50%-100%, with rates increasing at higher RP), whereas segments with lower RP scores or nonconsensus binding motifs tend to be inactive. Active DNA segments were shown to be occupied by GATA-1 protein by chromatin immunoprecipitation, whereas sites predicted to be inactive were not occupied. We verify four previously known erythroid CRMs and identify 28 novel ones. Thus, high RP in combination with another feature of a CRM, such as a conserved transcription factor binding site, is a good predictor of functional CRMs. Genome-wide predictions based on RP and a large set of well-defined transcription factor binding sites are available through servers at http://www.bx.psu.edu/.
基因组序列的多重比对是功能分析的有用指导,但从这些比对中准确预测顺式调控模块(CRM)仍然是一个难以实现的目标。我们通过结合从比对的非编码基因组序列中收集的两个特性来预测在红细胞中表达的哺乳动物基因的CRM:一个正调控潜能(RP)得分,它检测与调控区独特比对模式的相似性,以及必需的红系转录因子GATA-1的结合位点基序的保守性。在八个目标基因座内,我们通过在瞬时转染的人K562细胞中进行报告基因测定和/或在定点整合到小鼠红白血病细胞后,测试了75个非编码片段。具有高RP得分且与结合位点共有序列精确匹配的片段有较高的验证率(50%-100%,RP越高率越高),而RP得分较低或具有非共有结合基序的片段往往无活性。通过染色质免疫沉淀显示,活性DNA片段被GATA-1蛋白占据,而预测无活性的位点未被占据。我们验证了四个先前已知的红系CRM,并鉴定出28个新的CRM。因此,高RP与CRM的另一个特征(如保守的转录因子结合位点)相结合,是功能CRM的良好预测指标。基于RP和大量定义明确的转录因子结合位点的全基因组预测可通过http://www.bx.psu.edu/的服务器获得。