School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Shaanxi, 710049, China.
Guangdong Artificial Intelligence and Digital Economy Laboratory, Guangdong, 510335, China.
Commun Biol. 2022 Jun 20;5(1):608. doi: 10.1038/s42003-022-03546-y.
Topologically associating domains (TADs) are fundamental building blocks of three dimensional genome, and organized into complex hierarchies. Identifying hierarchical TADs on Hi-C data helps to understand the relationship between genome architectures and gene regulation. Herein we propose TADfit, a multivariate linear regression model for profiling hierarchical chromatin domains, which tries to fit the interaction frequencies in Hi-C contact matrix with and without replicates using all-possible hierarchical TADs, and the significant ones can be determined by the regression coefficients obtained with the help of an online learning solver called Follow-The-Regularized-Leader (FTRL). Beyond the existing methods, TADfit has an ability to handle multiple contact matrix replicates and find partially overlapping TADs on them, which helps to find the comprehensive underlying TADs across replicates from different experiments. The comparative results tell that TADfit has better accuracy and reproducibility, and the hierarchical TADs called by it exhibit a reasonable biological relevance.
拓扑关联结构域 (TADs) 是三维基因组的基本构建模块,并组织成复杂的层次结构。在 Hi-C 数据上识别层次 TADs 有助于理解基因组结构和基因调控之间的关系。本文提出了 TADfit,这是一种用于分析层次染色质结构域的多元线性回归模型,它试图使用所有可能的层次 TADs 拟合 Hi-C 接触矩阵中的相互作用频率,而不使用重复数据,并使用在线学习求解器 Follow-The-Regularized-Leader (FTRL) 获得的回归系数来确定显著的 TADs。与现有方法相比,TADfit 具有处理多个接触矩阵重复数据和在重复数据上找到部分重叠 TADs 的能力,这有助于从不同实验的重复数据中找到全面的基础 TADs。比较结果表明,TADfit 具有更高的准确性和可重复性,并且它所调用的层次 TADs 具有合理的生物学相关性。