Lamberti William Franz, Zang Chongzhi
Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA.
Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA.
Comput Struct Biotechnol J. 2022 Jun 20;20:3387-3398. doi: 10.1016/j.csbj.2022.06.018. eCollection 2022.
Higher-order chromatin structures have functional impacts on gene regulation and cell identity determination. Using high-throughput sequencing (HTS)-based methods like Hi-C, active or inactive compartments and open or closed topologically associating domain (TAD) structures can be identified on a cell population level. Recently developed high-resolution three-dimensional (3D) molecular imaging techniques such as 3D electron microscopy with in situ hybridization (3D-EMSIH) and 3D structured illumination microscopy (3D-SIM) enable direct detection of physical representations of chromatin structures in a single cell. However, computational analysis of 3D image data with explainability and interpretability on functional characteristics of chromatin structures is still challenging. We developed Extracting Physical-Characteristics from Images of Chromatin Structures (EPICS), a machine-learning based computational method for processing high-resolution chromatin 3D image data. Using EPICS on images produced by 3D-EMISH or 3D-SIM techniques, we generated more direct 3D representations of higher-order chromatin structures, identified major chromatin domains, and determined the open or closed status of each domain. We identified several high-contributing features from the model as the major physical characteristics that define the open or closed chromatin domains, demonstrating the explainability and interpretability of EPICS. EPICS can be applied to the analysis of other high-resolution 3D molecular imaging data for spatial genomics studies. The R and Python codes of EPICS are available at https://github.com/zang-lab/epics.
高阶染色质结构对基因调控和细胞身份决定具有功能影响。使用基于高通量测序(HTS)的方法,如Hi-C,可以在细胞群体水平上识别活跃或不活跃的区室以及开放或封闭的拓扑相关结构域(TAD)结构。最近开发的高分辨率三维(3D)分子成像技术,如原位杂交三维电子显微镜(3D-EMSIH)和三维结构光照显微镜(3D-SIM),能够直接检测单个细胞中染色质结构的物理表征。然而,对3D图像数据进行具有染色质结构功能特征的可解释性和可解读性的计算分析仍然具有挑战性。我们开发了从染色质结构图像中提取物理特征(EPICS),这是一种基于机器学习的用于处理高分辨率染色质3D图像数据的计算方法。使用EPICS处理由3D-EMISH或3D-SIM技术生成的图像,我们生成了更直接的高阶染色质结构3D表示,识别了主要的染色质结构域,并确定了每个结构域的开放或封闭状态。我们从模型中识别出几个高贡献特征作为定义开放或封闭染色质结构域的主要物理特征,证明了EPICS的可解释性和可解读性。EPICS可应用于其他高分辨率3D分子成像数据的分析,用于空间基因组学研究。EPICS的R和Python代码可在https://github.com/zang-lab/epics获取。