IEEE/ACM Trans Comput Biol Bioinform. 2022 Jul-Aug;19(4):2313-2323. doi: 10.1109/TCBB.2021.3084147. Epub 2022 Aug 8.
The availability of thousands of assays of epigenetic activity necessitates compressed representations of these data sets that summarize the epigenetic landscape of the genome. Until recently, most such representations were cell type-specific, applying to a single tissue or cell state. Recently, neural networks have made it possible to summarize data across tissues to produce a pan-cell type representation. In this work, we propose Epi-LSTM, a deep long short-term memory (LSTM) recurrent neural network autoencoder to capture the long-term dependencies in the epigenomic data. The latent representations from Epi-LSTM capture a variety of genomic phenomena, including gene-expression, promoter-enhancer interactions, replication timing, frequently interacting regions, and evolutionary conservation. These representations outperform existing methods in a majority of cell types while yielding smoother representations along the genomic axis due to their sequential nature.
有成千上万种表观遗传活性的检测方法,这就需要对这些数据集进行压缩表示,以总结基因组的表观遗传景观。直到最近,大多数这样的表示都是细胞类型特异性的,适用于单一的组织或细胞状态。最近,神经网络使得在不同组织中总结数据成为可能,从而产生了一种泛细胞类型的表示。在这项工作中,我们提出了 Epi-LSTM,这是一种深度长短期记忆(LSTM)递归神经网络自动编码器,用于捕获表观基因组数据中的长期依赖关系。Epi-LSTM 的潜在表示捕捉到了各种基因组现象,包括基因表达、启动子-增强子相互作用、复制时间、频繁相互作用区域和进化保守性。这些表示在大多数细胞类型中优于现有方法,并且由于其顺序性质,在基因组轴上产生了更平滑的表示。