Zhai Haixia, Dong Chengyao, Wang Tao, Luo Junwei
School of Software, Henan Polytechnic University, Jiaozuo, 454003, China.
Interdiscip Sci. 2024 Dec 23. doi: 10.1007/s12539-024-00677-0.
Structural variation (SV) is an important component of the diversity of the human genome. Many studies have shown that SV has a significant impact on human disease and is strongly associated with the development of cancer. In recent years, the Hi-C sequencing technique has been shown to be useful for detecting large-scale SVs, and several methods have been proposed for identifying SVs from Hi-C data. However, due to the complexity of the 3D genome structure, accurate identifying SVs from the Hi-C contact matrix remains a challenging task. Here, we present HiSVision, a method for identifying large-scale SVs from Hi-C data using a detection transformer framework. Inspired by object detection network, we transform the Hi-C contact matrix into images, then identify candidate SV regions on the image by detection transformer, and finally filter SVs based on features around the breakpoints. Experimental results show that HiSVision outperforms existing methods in terms of precision and F1 score on cancer cell lines and simulated datasets. The source code and data are available from https://github.com/dcy99/HiSVision .
结构变异(SV)是人类基因组多样性的一个重要组成部分。许多研究表明,SV对人类疾病有重大影响,并且与癌症的发生发展密切相关。近年来,Hi-C测序技术已被证明可用于检测大规模SV,并且已经提出了几种从Hi-C数据中识别SV的方法。然而,由于三维基因组结构的复杂性,从Hi-C接触矩阵中准确识别SV仍然是一项具有挑战性的任务。在此,我们提出了HiSVision,一种使用检测变压器框架从Hi-C数据中识别大规模SV的方法。受目标检测网络的启发,我们将Hi-C接触矩阵转换为图像,然后通过检测变压器在图像上识别候选SV区域,最后根据断点周围的特征筛选SV。实验结果表明,HiSVision在癌细胞系和模拟数据集上的精度和F1分数方面优于现有方法。源代码和数据可从https://github.com/dcy99/HiSVision获取。