Wang Xiaowo, Xuan Zhenyu, Zhao Xiaoyue, Li Yanda, Zhang Michael Q
MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, China.
Genome Res. 2009 Feb;19(2):266-75. doi: 10.1101/gr.081638.108. Epub 2008 Nov 7.
Correctly locating the gene transcription start site and the core-promoter is important for understanding transcriptional regulation mechanism. Here we have integrated specific genome-wide histone modification and DNA sequence features together to predict RNA polymerase II core-promoters in the human genome. Our new predictor CoreBoost_HM outperforms existing promoter prediction algorithms by providing significantly higher sensitivity and specificity at high resolution. We demonstrated that even though the histone modification data used in this study are from a specific cell type (CD4+ T-cell), our method can be used to identify both active and repressed promoters. We have applied it to search the upstream regions of microRNA genes, and show that CoreBoost_HM can accurately identify the known promoters of the intergenic microRNAs. We also identified a few intronic microRNAs that may have their own promoters. This result suggests that our new method can help to identify and characterize the core-promoters of both coding and noncoding genes.
正确定位基因转录起始位点和核心启动子对于理解转录调控机制至关重要。在此,我们整合了全基因组特定的组蛋白修饰和DNA序列特征,以预测人类基因组中的RNA聚合酶II核心启动子。我们新的预测器CoreBoost_HM在高分辨率下具有显著更高的灵敏度和特异性,优于现有的启动子预测算法。我们证明,尽管本研究中使用的组蛋白修饰数据来自特定细胞类型(CD4+ T细胞),但我们的方法可用于识别活跃和抑制的启动子。我们已将其应用于搜索微小RNA基因的上游区域,并表明CoreBoost_HM能够准确识别基因间微小RNA的已知启动子。我们还鉴定出一些可能具有自身启动子的内含子微小RNA。这一结果表明,我们的新方法有助于识别和表征编码基因和非编码基因的核心启动子。