Department of Computational Medicine and Bioinformatics, University of Michigan, 500 S. State St, Ann Arbor, MI 48109, USA.
Department of Computer Science and Engineering, University of Michigan, 500 S. State St, Ann Arbor, MI 48109, USA.
Nucleic Acids Res. 2023 Jul 7;51(12):5931-5947. doi: 10.1093/nar/gkad436.
Many deep learning approaches have been proposed to predict epigenetic profiles, chromatin organization, and transcription activity. While these approaches achieve satisfactory performance in predicting one modality from another, the learned representations are not generalizable across predictive tasks or across cell types. In this paper, we propose a deep learning approach named EPCOT which employs a pre-training and fine-tuning framework, and is able to accurately and comprehensively predict multiple modalities including epigenome, chromatin organization, transcriptome, and enhancer activity for new cell types, by only requiring cell-type specific chromatin accessibility profiles. Many of these predicted modalities, such as Micro-C and ChIA-PET, are quite expensive to get in practice, and the in silico prediction from EPCOT should be quite helpful. Furthermore, this pre-training and fine-tuning framework allows EPCOT to identify generic representations generalizable across different predictive tasks. Interpreting EPCOT models also provides biological insights including mapping between different genomic modalities, identifying TF sequence binding patterns, and analyzing cell-type specific TF impacts on enhancer activity.
许多深度学习方法已经被提出用于预测表观基因组图谱、染色质组织和转录活性。虽然这些方法在从一种模态预测另一种模态方面取得了令人满意的性能,但学习到的表示形式并不能在不同的预测任务或不同的细胞类型之间通用。在本文中,我们提出了一种名为 EPCOT 的深度学习方法,它采用了预训练和微调框架,仅需要细胞类型特异性染色质可及性图谱,就能够准确全面地预测多种模态,包括表观基因组、染色质组织、转录组和增强子活性,对于新的细胞类型也是如此。其中许多预测模态,如 Micro-C 和 ChIA-PET,在实践中都非常昂贵,而 EPCOT 的计算预测应该非常有帮助。此外,这种预训练和微调框架允许 EPCOT 识别可在不同预测任务中通用的通用表示。解释 EPCOT 模型还提供了生物学见解,包括不同基因组模态之间的映射、识别 TF 序列结合模式,以及分析细胞类型特异性 TF 对增强子活性的影响。