Guo Xingyi, Lin Mingyan, Rockowitz Shira, Lachman Herbert M, Zheng Deyou
The Saul R. Korey Department of Neurology, Albert Einstein College of Medicine, New York, New York, United States of America.
Department of Genetics, Albert Einstein College of Medicine, New York, New York, United States of America.
PLoS One. 2014 Apr 3;9(4):e93972. doi: 10.1371/journal.pone.0093972. eCollection 2014.
Thousands of pseudogenes exist in the human genome and many are transcribed, but their functional potential remains elusive and understudied. To explore these issues systematically, we first developed a computational pipeline to identify transcribed pseudogenes from RNA-Seq data. Applying the pipeline to datasets from 16 distinct normal human tissues identified ∼ 3,000 pseudogenes that could produce non-coding RNAs in a manner of low abundance but high tissue specificity under normal physiological conditions. Cross-tissue comparison revealed that the transcriptional profiles of pseudogenes and their parent genes showed mostly positive correlations, suggesting that pseudogene transcription could have a positive effect on the expression of their parent genes, perhaps by functioning as competing endogenous RNAs (ceRNAs), as previously suggested and demonstrated with the PTEN pseudogene, PTENP1. Our analysis of the ENCODE project data also found many transcriptionally active pseudogenes in the GM12878 and K562 cell lines; moreover, it showed that many human pseudogenes produced small RNAs (sRNAs) and some pseudogene-derived sRNAs, especially those from antisense strands, exhibited evidence of interfering with gene expression. Further integrated analysis of transcriptomics and epigenomics data, however, demonstrated that trimethylation of histone 3 at lysine 9 (H3K9me3), a posttranslational modification typically associated with gene repression and heterochromatin, was enriched at many transcribed pseudogenes in a transcription-level dependent manner in the two cell lines. The H3K9me3 enrichment was more prominent in pseudogenes that produced sRNAs at pseudogene loci and their adjacent regions, an observation further supported by the co-enrichment of SETDB1 (a H3K9 methyltransferase), suggesting that pseudogene sRNAs may have a role in regional chromatin repression. Taken together, our comprehensive and systematic characterization of pseudogene transcription uncovers a complex picture of how pseudogene ncRNAs could influence gene and pseudogene expression, at both epigenetic and post-transcriptional levels.
人类基因组中存在数千个假基因,其中许多都能转录,但它们的功能潜力仍然难以捉摸且研究不足。为了系统地探索这些问题,我们首先开发了一种计算流程,用于从RNA测序数据中识别转录的假基因。将该流程应用于来自16种不同正常人体组织的数据集,识别出约3000个假基因,这些假基因在正常生理条件下能够以低丰度但高组织特异性的方式产生非编码RNA。跨组织比较显示,假基因及其亲本基因的转录谱大多呈正相关,这表明假基因转录可能对其亲本基因的表达有积极影响,也许是通过作为竞争性内源RNA(ceRNA)发挥作用,正如之前对PTEN假基因PTENP1所提出并证明的那样。我们对ENCODE项目数据的分析还发现,在GM12878和K562细胞系中有许多转录活跃的假基因;此外,分析表明许多人类假基因产生小RNA(sRNA),一些假基因衍生的sRNA,特别是那些来自反义链的sRNA,表现出干扰基因表达的证据。然而,转录组学和表观基因组学数据的进一步综合分析表明,组蛋白3赖氨酸9位点的三甲基化(H3K9me3),一种通常与基因抑制和异染色质相关的翻译后修饰,在这两种细胞系中以转录水平依赖的方式在许多转录的假基因中富集。H3K9me3富集在假基因位点及其相邻区域产生sRNA的假基因中更为突出,SETDB1(一种H3K9甲基转移酶)的共富集进一步支持了这一观察结果,表明假基因sRNA可能在区域染色质抑制中发挥作用。综上所述,我们对假基因转录的全面系统表征揭示了假基因非编码RNA在表观遗传和转录后水平上如何影响基因和假基因表达的复杂情况。