Laboratory for Developmental Epigenetics, RIKEN Center for Biosystems Dynamics Research, Kobe, Japan.
Nonequilibrium Physics of Living Matter RIKEN Hakubi Research Team, RIKEN Center for Biosystems Dynamics Research, Kobe, Japan.
Methods Mol Biol. 2025;2856:79-117. doi: 10.1007/978-1-0716-4136-1_6.
Over a decade has passed since the development of the Hi-C method for genome-wide analysis of 3D genome organization. Hi-C utilizes next-generation sequencing (NGS) technology to generate large-scale chromatin interaction data, which has accumulated across a diverse range of species and cell types, particularly in eukaryotes. There is thus a growing need to streamline the process of Hi-C data analysis to utilize these data sets effectively. Hi-C generates data that are much larger compared to other NGS techniques such as chromatin immunoprecipitation sequencing (ChIP-seq) or RNA-seq, making the data reanalysis process computationally expensive. In an effort to bridge this resource gap, the 4D Nucleome (4DN) Data Portal has reanalyzed approximately 600 Hi-C data sets, allowing users to access and utilize the analyzed data. In this chapter, we provide detailed instructions for the implementation of the common workflow language (CWL)-based Hi-C analysis pipeline adopted by the 4DN Data Portal ecosystem. This reproducible and portable pipeline generates standard Hi-C contact matrices in formats such as .hic or .mcool from FASTQ files. It enables users to output their own Hi-C data in the same format as those registered in the 4DN Data portal, facilitating comparative analysis using data registered in the portal. Our custom-made scripts are available on GitHub at https://github.com/kuzobuta/4dn_cwl_pipeline .
自开发用于全基因组分析 3D 基因组组织的 Hi-C 方法以来,已经过去了十多年。Hi-C 利用下一代测序 (NGS) 技术生成大规模染色质相互作用数据,这些数据已经在多种物种和细胞类型中积累,特别是在真核生物中。因此,越来越需要简化 Hi-C 数据分析流程,以有效利用这些数据集。与其他 NGS 技术(如染色质免疫沉淀测序 (ChIP-seq) 或 RNA-seq)相比,Hi-C 生成的数据要大得多,这使得数据重新分析过程在计算上非常昂贵。为了弥合这一资源差距,4D Nucleome (4DN) 数据门户重新分析了大约 600 个 Hi-C 数据集,允许用户访问和利用分析后的数据。在本章中,我们提供了详细的说明,介绍了 4DN 数据门户生态系统采用的基于通用工作流语言 (CWL) 的 Hi-C 分析管道的实施。该可重复且可移植的管道从 FASTQ 文件生成标准的.hic 或.mcool 格式的 Hi-C 接触矩阵。它使用户能够以与 4DN 数据门户中注册的数据相同的格式输出自己的 Hi-C 数据,从而方便使用门户中注册的数据进行比较分析。我们的定制脚本可在 GitHub 上获得,网址为 https://github.com/kuzobuta/4dn_cwl_pipeline。