City University of Hong Kong Shenzhen Research Institute, Shenzhen, Guangdong, China.
Department of Computer Science, City University of Hong Kong, Hong Kong, China.
J Comput Biol. 2024 Sep;31(9):784-796. doi: 10.1089/cmb.2024.0490. Epub 2024 Jul 24.
High-throughput chromosome conformation capture (Hi-C) technology captures spatial interactions of DNA sequences into matrices, and software tools are developed to identify topologically associating domains (TADs) from the Hi-C matrices. With structural information theory, SuperTAD adopted a dynamic programming approach to find the TAD hierarchy with minimal structural entropy. However, the algorithm suffers from high time complexity. To accelerate this algorithm, we design and implement an approximation algorithm with a theoretical performance guarantee. We implemented a package, SuperTAD-Fast. Using Hi-C matrices and simulated data, we demonstrated that SuperTAD-Fast achieved great runtime improvement compared with SuperTAD. SuperTAD-Fast shows high consistency and significant enrichment of structural proteins from Hi-C data of human cell lines in comparison with the existing six hierarchical TADs detecting methods.
高通量染色体构象捕获 (Hi-C) 技术将 DNA 序列的空间相互作用捕获到矩阵中,并开发了软件工具来从 Hi-C 矩阵中识别拓扑关联结构域 (TAD)。SuperTAD 采用结构信息理论,采用动态规划方法找到具有最小结构熵的 TAD 层次结构。然而,该算法的时间复杂度很高。为了加速该算法,我们设计并实现了具有理论性能保证的近似算法。我们实现了一个软件包,SuperTAD-Fast。使用 Hi-C 矩阵和模拟数据,我们证明了 SuperTAD-Fast 与 SuperTAD 相比,在运行时得到了极大的改进。与现有的六种层次 TAD 检测方法相比,SuperTAD-Fast 从人类细胞系的 Hi-C 数据中显示出高度的一致性和显著的结构蛋白富集。