College of Computer Science and Technology, National University of Defense Technology, Hunan 410073, P. R. China.
College of Computer Science and Electronic Engineering, Hunan University, Hunan 410082, P. R. China.
Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae539.
Structural variants (SVs) play an important role in genetic research and precision medicine. As existing SV detection methods usually contain a substantial number of false positive calls, approaches to filter the detection results are needed.
We developed a novel deep learning-based SV filtering tool, CSV-Filter, for both short and long reads. CSV-Filter uses a novel multi-level grayscale image encoding method based on CIGAR strings of the alignment results and employs image augmentation techniques to improve SV feature extraction. CSV-Filter also utilizes self-supervised learning networks for transfer as classification models, and employs mixed-precision operations to accelerate training. The experiments showed that the integration of CSV-Filter with popular SV detection tools could considerably reduce false positive SVs for short and long reads, while maintaining true positive SVs almost unchanged. Compared with DeepSVFilter, a SV filtering tool for short reads, CSV-Filter could recognize more false positive calls and support long reads as an additional feature.
结构变异 (SV) 在基因研究和精准医学中发挥着重要作用。由于现有的 SV 检测方法通常包含大量的假阳性调用,因此需要一种过滤检测结果的方法。
我们开发了一种新颖的基于深度学习的 SV 过滤工具 CSV-Filter,适用于短读长读。CSV-Filter 使用一种新颖的基于 CIGAR 字符串的多层次灰度图像编码方法,并采用图像增强技术来改进 SV 特征提取。CSV-Filter 还利用自监督学习网络进行迁移作为分类模型,并采用混合精度运算来加速训练。实验表明,将 CSV-Filter 与流行的 SV 检测工具集成可以大大减少短读长读的假阳性 SV,同时几乎保持真阳性 SV 不变。与用于短读的 SV 过滤工具 DeepSVFilter 相比,CSV-Filter 可以识别更多的假阳性调用,并支持长读作为附加功能。