Center for Theoretical Biological Physics, Rice University, Houston, TX, USA. Electronic address: https://twitter.com/edoderoroja.
Center for Theoretical Biological Physics, Rice University, Houston, TX, USA. Electronic address: https://twitter.com/_matheusfmello.
J Mol Biol. 2023 Aug 1;435(15):168180. doi: 10.1016/j.jmb.2023.168180. Epub 2023 Jun 9.
The folding patterns of interphase genomes in higher eukaryotes, as obtained from DNA-proximity-ligation or Hi-C experiments, are used to classify loci into structural classes called compartments and subcompartments. These structurally annotated (sub) compartments are known to exhibit specific epigenomic characteristics and cell-type-specific variations. To explore the relationship between genome structure and the epigenome, we present PyMEGABASE (PYMB), a maximum-entropy-based neural network model that predicts (sub) compartment annotations of a locus based solely on the local epigenome, such as ChIP-Seq of histone post-translational modifications. PYMB builds upon our previous model while improving robustness, capability to handle diverse inputs and user-friendly implementation. We employed PYMB to predict subcompartments for over a hundred human cell types available in ENCODE, shedding light on the links between subcompartments, cell identity, and epigenomic signals. The fact that PYMB, trained on data for human cells, can accurately predict compartments in mice suggests that the model is learning underlying physicochemical principles transferable across cell types and species. Reliable at higher resolutions (up to 5 kbp), PYMB is used to investigate compartment-specific gene expression. Not only can PYMB generate (sub) compartment information without Hi-C experiments, but its predictions are also interpretable. Analyzing PYMB's trained parameters, we explore the importance of various epigenomic marks in each subcompartment prediction. Furthermore, the predictions of the model can be used as input for OpenMiChroM software, which has been calibrated to generate three-dimensional structures of the genome. Detailed documentation of PYMB is available at https://pymegabase.readthedocs.io, including an installation guide using pip or conda, and Jupyter/Colab notebook tutorials.
高等真核生物的染色质在有丝分裂中的折叠模式,可通过 DNA 接近连接或 Hi-C 实验获得,用于将基因座分类为结构类别,称为隔室和亚隔室。这些结构注释的(亚)隔室已知具有特定的表观遗传特征和细胞类型特异性变化。为了探索基因组结构与表观基因组之间的关系,我们提出了 PyMEGABASE(PYMB),这是一种基于最大熵的神经网络模型,它仅根据局部表观基因组(如组蛋白翻译后修饰的 ChIP-Seq)预测基因座的(亚)隔室注释。PYMB 建立在我们之前的模型基础上,同时提高了稳健性、处理多种输入的能力和用户友好的实现。我们使用 PYMB 预测了 ENCODE 中超过 100 个人类细胞类型的亚隔室,揭示了亚隔室、细胞身份和表观遗传信号之间的联系。事实证明,PYMB 可以在人类细胞数据上进行训练,并准确预测小鼠中的隔室,这表明该模型正在学习可跨细胞类型和物种转移的潜在物理化学原理。在更高分辨率(高达 5 kbp)下可靠,PYMB 用于研究隔室特异性基因表达。PYMB 不仅可以在没有 Hi-C 实验的情况下生成(亚)隔室信息,而且其预测结果也具有可解释性。分析 PYMB 的训练参数,我们探讨了各种表观遗传标记在每个亚隔室预测中的重要性。此外,模型的预测可作为 OpenMiChroM 软件的输入,该软件已经过校准可生成基因组的三维结构。详细的 PYMB 文档可在 https://pymegabase.readthedocs.io 上获得,包括使用 pip 或 conda 的安装指南,以及 Jupyter/Colab 笔记本教程。