Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.
Center of Excellence on Smart Health, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.
Genome Res. 2024 Sep 20;34(8):1174-1184. doi: 10.1101/gr.279274.124.
Chromatin loop identification plays an important role in molecular biology and 3D genomics research, as it constitutes a fundamental process in transcription and gene regulation. Such precise chromatin structures can be identified across genome-wide interaction matrices via Hi-C data analysis, which is essential for unraveling the intricacies of transcriptional regulation. Given the increasing number of genome-wide contact maps, derived from both in situ Hi-C and single-cell Hi-C experiments, there is a pressing need for efficient and resilient algorithms capable of processing data from diverse experiments rapidly and adaptively. Here, we propose YOLOOP, a novel detection-based framework that is different from the conventional paradigm. YOLOOP stands out for its speed, surpassing the performance of previous state-of-the-art (SOTA) chromatin loop detection methods. It achieves a 30-fold acceleration compared with classification-based methods, up to 20-fold acceleration compared with the SOTA kernel-based framework, and a fivefold acceleration compared with statistical algorithms. Furthermore, the proposed framework is capable of generalizing across various cell types, multiresolution Hi-C maps, and diverse experimental protocols. Compared with the existing paradigms, YOLOOP shows up to a 10% increase in recall and a 15% increase in F1-score, particularly noteworthy in the GM12878 cell line. YOLOOP also offers fast adaptability with straightforward fine-tuning, making it readily applicable to extremely sparse single-cell Hi-C contact maps. It maintains its exceptional speed, completing genome-wide detection at a 10 kb resolution for a single-cell contact map within 1 min and for a 900-cell-superimposed contact map within 3 min, enabling fast analysis of large-scale single-cell data.
染色质环识别在分子生物学和 3D 基因组学研究中起着重要作用,因为它是转录和基因调控的基本过程。通过 Hi-C 数据分析可以在全基因组相互作用矩阵中识别这种精确的染色质结构,这对于揭示转录调控的复杂性至关重要。鉴于越来越多的全基因组接触图谱,无论是来自原位 Hi-C 还是单细胞 Hi-C 实验,都迫切需要能够快速自适应地处理来自不同实验的数据的高效和弹性算法。在这里,我们提出了 YOLOOP,这是一种不同于传统范例的基于检测的新框架。YOLOOP 的速度很快,超过了以前的最先进(SOTA)染色质环检测方法的性能。与基于分类的方法相比,它的速度提高了 30 倍,与基于核的 SOTA 框架相比,速度提高了 20 倍,与统计算法相比,速度提高了 5 倍。此外,该框架能够跨各种细胞类型、多分辨率 Hi-C 图谱和不同的实验方案进行泛化。与现有的范例相比,YOLOOP 在 GM12878 细胞系中,召回率提高了 10%,F1 得分提高了 15%,这是特别值得注意的。YOLOOP 还具有快速适应能力,只需进行简单的微调,即可轻松应用于极其稀疏的单细胞 Hi-C 接触图谱。它保持了其出色的速度,在 1 分钟内完成全基因组检测,分辨率为 10 kb,在 3 分钟内完成 900 个细胞叠加的接触图谱检测,从而能够快速分析大规模的单细胞数据。