Suppr超能文献

表观基因组学:使用组蛋白修饰的多元概率模型进行全基因组转录单元注释。

EPIGENE: genome-wide transcription unit annotation using a multivariate probabilistic model of histone modifications.

机构信息

Institute for Medical Bioinformatics and Biostatistics, Philipps University of Marburg, 35037, Marburg, Germany.

Otto-Warburg-Laboratory, Max Planck Institute for Molecular Genetics, 14195, Berlin, Germany.

出版信息

Epigenetics Chromatin. 2020 Apr 7;13(1):20. doi: 10.1186/s13072-020-00341-z.

Abstract

BACKGROUND

Understanding the transcriptome is critical for explaining the functional as well as regulatory roles of genomic regions. Current methods for the identification of transcription units (TUs) use RNA-seq that, however, require large quantities of mRNA rendering the identification of inherently unstable TUs, e.g. miRNA precursors, difficult. This problem can be alleviated by chromatin-based approaches due to a correlation between histone modifications and transcription.

RESULTS

Here, we introduce EPIGENE, a novel chromatin segmentation method for the identification of active TUs using transcription-associated histone modifications. Unlike the existing chromatin segmentation approaches, EPIGENE uses a constrained, semi-supervised multivariate hidden Markov model (HMM) that models the observed combination of histone modifications using a product of independent Bernoulli random variables, to identify active TUs. Our results show that EPIGENE can identify genome-wide TUs in an unbiased manner. EPIGENE-predicted TUs show an enrichment of RNA Polymerase II at the transcription start site and in gene body indicating that they are indeed transcribed. Comprehensive validation using existing annotations revealed that 93% of EPIGENE TUs can be explained by existing gene annotations and 5% of EPIGENE TUs in HepG2 can be explained by microRNA annotations. EPIGENE outperformed the existing RNA-seq-based approaches in TU prediction precision across human cell lines. Finally, we identified 232 novel TUs in K562 and 43 novel cell-specific TUs all of which were supported by RNA Polymerase II ChIP-seq and Nascent RNA-seq data.

CONCLUSION

We demonstrate the applicability of EPIGENE to identify genome-wide active TUs and to provide valuable information about unannotated TUs. EPIGENE is an open-source method and is freely available at: https://github.com/imbbLab/EPIGENE.

摘要

背景

理解转录组对于解释基因组区域的功能和调控作用至关重要。当前用于识别转录单元(TU)的方法使用 RNA-seq,但需要大量的 mRNA,这使得固有不稳定的 TU(例如 miRNA 前体)的识别变得困难。由于组蛋白修饰与转录之间存在相关性,基于染色质的方法可以缓解这个问题。

结果

在这里,我们引入了 EPIGENE,这是一种使用与转录相关的组蛋白修饰来识别活性 TU 的新型染色质分割方法。与现有的染色质分割方法不同,EPIGENE 使用受约束的半监督多变量隐马尔可夫模型(HMM),该模型使用独立的伯努利随机变量的乘积对观察到的组蛋白修饰组合进行建模,以识别活性 TU。我们的结果表明,EPIGENE 可以以无偏的方式识别全基因组范围内的 TU。EPIGENE 预测的 TU 在转录起始位点和基因体处显示 RNA 聚合酶 II 的富集,表明它们确实被转录。使用现有的注释进行全面验证表明,93%的 EPIGENE TU 可以用现有的基因注释来解释,5%的 HepG2 中的 EPIGENE TU 可以用 microRNA 注释来解释。EPIGENE 在预测人类细胞系中的 TU 精度方面优于现有的基于 RNA-seq 的方法。最后,我们在 K562 中鉴定了 232 个新的 TU 和 43 个新的细胞特异性 TU,所有这些 TU 都得到了 RNA 聚合酶 II ChIP-seq 和新生 RNA-seq 数据的支持。

结论

我们证明了 EPIGENE 可用于识别全基因组范围内的活性 TU,并提供有关未注释 TU 的有价值信息。EPIGENE 是一个开源方法,可在 https://github.com/imbbLab/EPIGENE 上免费获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验