Computer Science Department, University of California, Irvine, CA 92697, USA.
Mathematical, Computational & Systems Biology, University of California, Irvine, CA 92697, USA.
Genes (Basel). 2022 Mar 30;13(4):621. doi: 10.3390/genes13040621.
Mapping chromatin insulator loops is crucial to investigating genome evolution, elucidating critical biological functions, and ultimately quantifying variant impact in diseases. However, chromatin conformation profiling assays are usually expensive, time-consuming, and may report fuzzy insulator annotations with low resolution. Therefore, we propose a weakly supervised deep learning method, InsuLock, to address these challenges. Specifically, InsuLock first utilizes a Siamese neural network to predict the existence of insulators within a given region (up to 2000 bp). Then, it uses an object detection module for precise insulator boundary localization via gradient-weighted class activation mapping (~40 bp resolution). Finally, it quantifies variant impacts by comparing the insulator score differences between the wild-type and mutant alleles. We applied InsuLock on various bulk and single-cell datasets for performance testing and benchmarking. We showed that it outperformed existing methods with an AUROC of ~0.96 and condensed insulator annotations to ~2.5% of their original size while still demonstrating higher conservation scores and better motif enrichments. Finally, we utilized InsuLock to make cell-type-specific variant impacts from brain scATAC-seq data and identified a schizophrenia GWAS variant disrupting an insulator loop proximal to a known risk gene, indicating a possible new mechanism of action for the disease.
绘制染色质绝缘子环对于研究基因组进化、阐明关键生物学功能以及最终量化疾病中的变异影响至关重要。然而,染色质构象分析检测通常昂贵、耗时,并且可能会报告分辨率较低的模糊绝缘子注释。因此,我们提出了一种弱监督深度学习方法 InsuLock 来解决这些挑战。具体来说,InsuLock 首先利用孪生神经网络预测给定区域(长达 2000bp)内绝缘子的存在。然后,它使用目标检测模块通过梯度加权类激活映射(40bp 分辨率)进行精确的绝缘子边界定位。最后,它通过比较野生型和突变型等位基因之间的绝缘子得分差异来量化变异的影响。我们在各种批量和单细胞数据集上应用 InsuLock 进行性能测试和基准测试。结果表明,InsuLock 的表现优于现有方法,AUROC 约为 0.96,并将绝缘子注释压缩到原始大小的2.5%,同时仍然表现出更高的保守分数和更好的基序富集。最后,我们利用 InsuLock 从大脑 scATAC-seq 数据中进行细胞类型特异性变异影响分析,并鉴定出一个精神分裂症 GWAS 变异破坏了一个已知风险基因附近的绝缘子环,这表明了该疾病的一个新的可能作用机制。