Trinklein Nathan D, Karaöz Ulaş, Wu Jiaqian, Halees Anason, Force Aldred Shelley, Collins Patrick J, Zheng Deyou, Zhang Zhengdong D, Gerstein Mark B, Snyder Michael, Myers Richard M, Weng Zhiping
Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA.
Genome Res. 2007 Jun;17(6):720-31. doi: 10.1101/gr.5716607.
The regulation of transcriptional initiation in the human genome is a critical component of global gene regulation, but a complete catalog of human promoters currently does not exist. In order to identify regulatory regions, we developed four computational methods to integrate 129 sets of ENCODE-wide chromatin immunoprecipitation data. They collectively predicted 1393 regions. Roughly 47% of the regions were unique to one method, as each method makes different assumptions about the data. Overall, predicted regions tend to localize to highly conserved, DNase I hypersensitive, and actively transcribed regions in the genome. Interestingly, a significant portion of the regions overlaps with annotated 3'-UTRs, suggesting that some of them might regulate anti-sense transcription. The majority of the predicted regions are >2 kb away from the 5'-ends of previously annotated human cDNAs and hence are novel. These novel regions may regulate unannotated transcripts or may represent new alternative transcription start sites of known genes. We tested 163 such regions for promoter activity in four cell lines using transient transfection assays, and 25% of them showed transcriptional activity above background in at least one cell line. We also performed 5'-RACE experiments on 62 novel regions, and 76% of the regions were associated with the 5'-ends of at least two RACE products. Our results suggest that there are at least 35% more functional promoters in the human genome than currently annotated.
人类基因组中转录起始的调控是全局基因调控的关键组成部分,但目前尚不存在完整的人类启动子目录。为了识别调控区域,我们开发了四种计算方法来整合129组全基因组范围的染色质免疫沉淀数据。它们共同预测了1393个区域。大约47%的区域是一种方法所独有的,因为每种方法对数据都有不同的假设。总体而言,预测区域倾向于定位在基因组中高度保守、对DNase I敏感且活跃转录的区域。有趣的是,相当一部分区域与注释的3'-UTR重叠,这表明其中一些可能调控反义转录。大多数预测区域距离先前注释的人类cDNA的5'-端超过2 kb,因此是新的。这些新区域可能调控未注释的转录本,或者可能代表已知基因的新的可变转录起始位点。我们使用瞬时转染实验在四种细胞系中测试了163个这样的区域的启动子活性,其中25%在至少一种细胞系中显示出高于背景的转录活性。我们还对62个新区域进行了5'-RACE实验,76%的区域与至少两种RACE产物的5'-端相关。我们的结果表明,人类基因组中功能性启动子的数量比目前注释的至少多35%。