Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory for Biocontrol, Sun Yat-sen University, Guangzhou 510275, PR China.
Nucleic Acids Res. 2018 Jan 4;46(D1):D85-D91. doi: 10.1093/nar/gkx972.
Although thousands of pseudogenes have been annotated in the human genome, their transcriptional regulation, expression profiles and functional mechanisms are largely unknown. In this study, we developed dreamBase (http://rna.sysu.edu.cn/dreamBase) to facilitate the investigation of DNA modification, RNA regulation and protein binding of potential expressed pseudogenes from multidimensional high-throughput sequencing data. Based on ∼5500 ChIP-seq and DNase-seq datasets, we identified genome-wide binding profiles of various transcription-associated factors around pseudogene loci. By integrating ∼18 000 RNA-seq data, we analysed the expression profiles of pseudogenes and explored their co-expression patterns with their parent genes in 32 cancers and 31 normal tissues. By combining microRNA binding sites, we demonstrated complex post-transcriptional regulation networks involving 275 microRNAs and 1201 pseudogenes. We generated ceRNA networks to illustrate the crosstalk between pseudogenes and their parent genes through competitive binding of microRNAs. In addition, we studied transcriptome-wide interactions between RNA binding proteins (RBPs) and pseudogenes based on 458 CLIP-seq datasets. In conjunction with epitranscriptome sequencing data, we also mapped 1039 RNA modification sites onto 635 pseudogenes. This database will provide insights into the transcriptional regulation, expression, functions and mechanisms of pseudogenes as well as their roles in biological processes and diseases.
虽然人类基因组中已经注释了数千个假基因,但它们的转录调控、表达谱和功能机制在很大程度上仍是未知的。在这项研究中,我们开发了 dreamBase(http://rna.sysu.edu.cn/dreamBase),以方便从多维高通量测序数据中研究潜在表达假基因的 DNA 修饰、RNA 调控和蛋白质结合。基于约 5500 个 ChIP-seq 和 DNase-seq 数据集,我们确定了假基因位置周围各种转录相关因子的全基因组结合图谱。通过整合约 18000 个 RNA-seq 数据,我们分析了假基因的表达谱,并在 32 种癌症和 31 种正常组织中探索了它们与其亲本基因的共表达模式。通过结合 microRNA 结合位点,我们展示了涉及 275 个 microRNA 和 1201 个假基因的复杂的转录后调控网络。我们生成了 ceRNA 网络,以说明通过 microRNA 的竞争性结合,假基因与其亲本基因之间的串扰。此外,我们还基于 458 个 CLIP-seq 数据集研究了 RNA 结合蛋白(RBPs)与假基因之间的全转录组相互作用。结合表观转录组测序数据,我们还将 1039 个 RNA 修饰位点映射到 635 个假基因上。这个数据库将为假基因的转录调控、表达、功能和机制,以及它们在生物过程和疾病中的作用提供新的见解。