Wang Fuzhou, Gao Tingxiao, Lin Jiecong, Zheng Zetian, Huang Lei, Toseef Muhammad, Li Xiangtao, Wong Ka-Chun
Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China.
Department of Medical Biophysics, Faculty of Medicine, University of Toronto, Toronto, ON M5G1L7, Canada.
iScience. 2022 Nov 10;25(12):105535. doi: 10.1016/j.isci.2022.105535. eCollection 2022 Dec 22.
Graph and image are two common representations of Hi-C -contact maps. Existing computational tools have only adopted Hi-C data modeled as unitary data structures but neglected the potential advantages of synergizing the information of different views. Here we propose GILoop, a dual-branch neural network that learns from both representations to identify genome-wide CTCF-mediated loops. With GILoop, we explore the combined strength of integrating the two view representations of Hi-C data and corroborate the complementary relationship between the views. In particular, the model outperforms the state-of-the-art loop calling framework and is also more robust against low-quality Hi-C libraries. We also uncover distinct preferences for matrix density by graph-based and image-based models, revealing interesting insights into Hi-C data elucidation. Finally, along with multiple transfer-learning case studies, we demonstrate that GILoop can accurately model the organizational and functional patterns of CTCF-mediated looping across different cell lines.
图谱和图像是Hi-C接触图谱的两种常见表示形式。现有的计算工具仅采用建模为单一数据结构的Hi-C数据,却忽略了整合不同视图信息的潜在优势。在此,我们提出了GILoop,这是一种双分支神经网络,它从两种表示形式中学习,以识别全基因组范围内CTCF介导的环。借助GILoop,我们探索了整合Hi-C数据的两种视图表示形式的综合优势,并证实了视图之间的互补关系。特别是,该模型优于当前最先进的环调用框架,并且对低质量Hi-C文库也更具鲁棒性。我们还发现了基于图谱和基于图像的模型对矩阵密度的不同偏好,揭示了有关Hi-C数据阐释的有趣见解。最后,通过多个迁移学习案例研究,我们证明GILoop可以准确地模拟不同细胞系中CTCF介导的环化的组织和功能模式。