Gupta Ravi, Bhattacharyya Anirban, Agosto-Perez Francisco J, Wickramasinghe Priyankara, Davuluri Ramana V
Center for Systems and Computational Biology, Molecular and Cellular Oncogenesis Program, The Wistar Institute, Philadelphia, PA, USA.
Nucleic Acids Res. 2011 Jan;39(Database issue):D92-7. doi: 10.1093/nar/gkq1171. Epub 2010 Nov 21.
MPromDb (Mammalian Promoter Database) is a curated database that strives to annotate gene promoters identified from ChIP-seq results with the goal of providing an integrated resource for mammalian transcriptional regulation and epigenetics. We analyzed 507 million uniquely aligned RNAP-II ChIP-seq reads from 26 different data sets that include six human cell-types and 10 distinct mouse cell/tissues. The updated MPromDb version consists of computationally predicted (novel) and known active RNAP-II promoters (42,893 human and 48,366 mouse promoters) from various data sets freely available at NCBI GEO database. We found that 36% and 40% of protein-coding genes have alternative promoters in human and mouse genomes and ∼40% of promoters are tissue/cell specific. The identified RNAP-II promoters were annotated using various known and novel gene models. Additionally, for novel promoters we looked into other evidences-GenBank mRNAs, spliced ESTs, CAGE promoter tags and mRNA-seq reads. Users can search the database based on gene id/symbol, or by specific tissue/cell type and filter results based on any combination of tissue/cell specificity, Known/Novel, CpG/NonCpG, and protein-coding/non-coding gene promoters. We have also integrated GBrowse genome browser with MPromDb for visualization of ChIP-seq profiles and to display the annotations. The current release of MPromDb can be accessed at http://bioinformatics.wistar.upenn.edu/MPromDb/.
MPromDb(哺乳动物启动子数据库)是一个经过精心策划的数据库,致力于对从ChIP-seq结果中识别出的基因启动子进行注释,目的是为哺乳动物转录调控和表观遗传学提供一个综合资源。我们分析了来自26个不同数据集的5.07亿条唯一比对的RNA聚合酶II ChIP-seq读数,这些数据集包括六种人类细胞类型和十种不同的小鼠细胞/组织。更新后的MPromDb版本包含来自NCBI GEO数据库中可免费获取的各种数据集的计算预测(新的)和已知的活性RNA聚合酶II启动子(42893个人类启动子和48366个小鼠启动子)。我们发现,在人类和小鼠基因组中,分别有36%和40%的蛋白质编码基因具有可变启动子,且约40%的启动子是组织/细胞特异性的。使用各种已知和新的基因模型对鉴定出的RNA聚合酶II启动子进行注释。此外,对于新的启动子,我们还研究了其他证据——GenBank mRNA、剪接的EST、CAGE启动子标签和mRNA-seq读数。用户可以根据基因ID/符号搜索数据库,也可以按特定组织/细胞类型进行搜索,并根据组织/细胞特异性、已知/新的、CpG/非CpG以及蛋白质编码/非编码基因启动子的任意组合过滤结果。我们还将GBrowse基因组浏览器与MPromDb集成,以可视化ChIP-seq图谱并显示注释。MPromDb的当前版本可在http://bioinformatics.wistar.upenn.edu/MPromDb/上访问。