Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA.
BMC Bioinformatics. 2020 Jul 20;21(1):319. doi: 10.1186/s12859-020-03652-w.
The three-dimensional (3D) structure of the genome plays a crucial role in gene expression regulation. Chromatin conformation capture technologies (Hi-C) have revealed that the genome is organized in a hierarchy of topologically associated domains (TADs), sub-TADs, and chromatin loops. Identifying such hierarchical structures is a critical step in understanding genome regulation. Existing tools for TAD calling are frequently sensitive to biases in Hi-C data, depend on tunable parameters, and are computationally inefficient.
To address these challenges, we developed a novel sliding window-based spectral clustering framework that uses gaps between consecutive eigenvectors for TAD boundary identification.
Our method, implemented in an R package, SpectralTAD, detects hierarchical, biologically relevant TADs, has automatic parameter selection, is robust to sequencing depth, resolution, and sparsity of Hi-C data. SpectralTAD outperforms four state-of-the-art TAD callers in simulated and experimental settings. We demonstrate that TAD boundaries shared among multiple levels of the TAD hierarchy were more enriched in classical boundary marks and more conserved across cell lines and tissues. In contrast, boundaries of TADs that cannot be split into sub-TADs showed less enrichment and conservation, suggesting their more dynamic role in genome regulation.
SpectralTAD is available on Bioconductor, http://bioconductor.org/packages/SpectralTAD/ .
基因组的三维(3D)结构在基因表达调控中起着至关重要的作用。染色质构象捕获技术(Hi-C)揭示了基因组在拓扑相关结构域(TAD)、亚 TAD 和染色质环的层次结构中组织。识别这种层次结构是理解基因组调控的关键步骤。现有的 TAD 调用工具通常容易受到 Hi-C 数据偏差的影响,依赖于可调参数,并且计算效率低下。
为了解决这些挑战,我们开发了一种新的基于滑动窗口的谱聚类框架,该框架使用连续特征向量之间的间隙来识别 TAD 边界。
我们的方法,实现于 R 包 SpectralTAD 中,检测到了层次化、生物学相关的 TAD,具有自动参数选择功能,对测序深度、分辨率和 Hi-C 数据的稀疏性具有鲁棒性。SpectralTAD 在模拟和实验环境中优于四种最先进的 TAD 调用器。我们证明了在多个 TAD 层次结构中共有的 TAD 边界在经典边界标记中更为丰富,并且在细胞系和组织中更为保守。相比之下,不能进一步分割为亚 TAD 的 TAD 边界显示出较少的富集和保守性,这表明它们在基因组调控中具有更动态的作用。
SpectralTAD 可在 Bioconductor 上获取,网址为 http://bioconductor.org/packages/SpectralTAD/ 。