用于分组Hi-C数据分析的完整且简化的Snakemake工作流。

: a complete and simplified snakemake pipeline for grouped Hi-C data analysis.

作者信息

Gregoricchio Sebastian, Zwart Wilbert

机构信息

Division of Oncogenomics, Netherlands Cancer Institute, Oncode Institute, 1066CX Amsterdam, The Netherlands.

出版信息

Bioinform Adv. 2023 Jun 21;3(1):vbad080. doi: 10.1093/bioadv/vbad080. eCollection 2023.

DOI:10.1093/bioadv/vbad080

PMID:37397353

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10307938/

Abstract

SUMMARY

Genome-wide chromosome conformation capture (Hi-C) is a technique that allows the study of 3D genome organization. Despite being widely used, analysis of Hi-C data is technically challenging and involves several time-consuming steps that often require manual involvement making it error prone, potentially affecting data reproducibility. In order to facilitate and simplify these analyses we implemented , a snakemake-based pipeline that allows for the generation of contact matrices at multiple resolutions in one single run, aggregation of individual samples into user-specified groups, detection of domains, compartments, loops and stripes and performance of differential compartment and chromatin interaction analyses.

AVAILABILITY AND IMPLEMENTATION

Source code is freely available at https://github.com/sebastian-gregoricchio/snHiC. A yaml-formatted file (snHiC/workflow/envs/snHiC_conda_env_stable.yaml) is available to build a compatible conda environment.

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

全基因组染色体构象捕获技术（Hi-C）是一种能够研究三维基因组组织的技术。尽管该技术已被广泛应用，但Hi-C数据分析在技术上具有挑战性，涉及多个耗时步骤，且常常需要人工参与，容易出错，这可能会影响数据的可重复性。为了便于和简化这些分析，我们实施了一个基于Snakemake的流程，该流程允许在一次运行中生成多种分辨率的接触矩阵，将各个样本聚合到用户指定的组中，检测结构域、区室、环和条带，并进行差异区室和染色质相互作用分析。