Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America.
Department of Cell Biology, Yale School of Medicine, New Haven, Connecticut, United States of America.
PLoS Comput Biol. 2024 Jul 15;20(7):e1012221. doi: 10.1371/journal.pcbi.1012221. eCollection 2024 Jul.
Chromatin is a polymer complex of DNA and proteins that regulates gene expression. The three-dimensional (3D) structure and organization of chromatin controls DNA transcription and replication. High-throughput chromatin conformation capture techniques generate Hi-C maps that can provide insight into the 3D structure of chromatin. Hi-C maps can be represented as a symmetric matrix [Formula: see text], where each element represents the average contact probability or number of contacts between chromatin loci i and j. Previous studies have detected topologically associating domains (TADs), or self-interacting regions in [Formula: see text] within which the contact probability is greater than that outside the region. Many algorithms have been developed to identify TADs within Hi-C maps. However, most TAD identification algorithms are unable to identify nested or overlapping TADs and for a given Hi-C map there is significant variation in the location and number of TADs identified by different methods. We develop a novel method to identify TADs, KerTAD, using a kernel-based technique from computer vision and image processing that is able to accurately identify nested and overlapping TADs. We benchmark this method against state-of-the-art TAD identification methods on both synthetic and experimental data sets. We find that the new method consistently has higher true positive rates (TPR) and lower false discovery rates (FDR) than all tested methods for both synthetic and manually annotated experimental Hi-C maps. The TPR for KerTAD is also largely insensitive to increasing noise and sparsity, in contrast to the other methods. We also find that KerTAD is consistent in the number and size of TADs identified across replicate experimental Hi-C maps for several organisms. Thus, KerTAD will improve automated TAD identification and enable researchers to better correlate changes in TADs to biological phenomena, such as enhancer-promoter interactions and disease states.
染色质是 DNA 和蛋白质的聚合物复合物,可调节基因表达。染色质的三维(3D)结构和组织控制着 DNA 的转录和复制。高通量染色质构象捕获技术产生了 Hi-C 图谱,可以深入了解染色质的 3D 结构。Hi-C 图谱可以表示为对称矩阵[公式:见正文],其中每个元素代表染色质位点 i 和 j 之间的平均接触概率或接触次数。先前的研究已经检测到拓扑关联域(TAD),即在[公式:见正文]中自我相互作用的区域,其中接触概率大于区域之外的接触概率。已经开发了许多算法来识别 Hi-C 图谱中的 TAD。然而,大多数 TAD 识别算法无法识别嵌套或重叠的 TAD,并且对于给定的 Hi-C 图谱,不同方法识别的 TAD 的位置和数量存在很大差异。我们使用计算机视觉和图像处理中的基于核的技术开发了一种新的 TAD 识别方法 KerTAD,该方法能够准确识别嵌套和重叠的 TAD。我们在合成数据集和实验数据集上,将此方法与最先进的 TAD 识别方法进行了基准测试。我们发现,对于合成和手动注释的实验 Hi-C 图谱,新方法的真阳性率(TPR)始终高于所有测试方法,假阳性率(FDR)均较低。与其他方法相比,KerTAD 的 TPR 对噪声和稀疏性的增加也不敏感。我们还发现,对于几个生物体的重复实验 Hi-C 图谱,KerTAD 识别的 TAD 的数量和大小是一致的。因此,KerTAD 将改善自动化 TAD 识别,并使研究人员能够更好地将 TAD 的变化与生物学现象(例如增强子-启动子相互作用和疾病状态)相关联。