BreakNet：使用长读长和深度学习方法检测缺失

BreakNet: detecting deletions using long reads and a deep learning approach.

机构信息

College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454003, China.

School of Computer Science and Information Engineering, Henan University, Kaifeng, 475001, China.

出版信息

BMC Bioinformatics. 2021 Dec 2;22(1):577. doi: 10.1186/s12859-021-04499-5.

DOI:10.1186/s12859-021-04499-5

PMID:34856923

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8641175/

Abstract

BACKGROUND

Structural variations (SVs) occupy a prominent position in human genetic diversity, and deletions form an important type of SV that has been suggested to be associated with genetic diseases. Although various deletion calling methods based on long reads have been proposed, a new approach is still needed to mine features in long-read alignment information. Recently, deep learning has attracted much attention in genome analysis, and it is a promising technique for calling SVs.

RESULTS

In this paper, we propose BreakNet, a deep learning method that detects deletions by using long reads. BreakNet first extracts feature matrices from long-read alignments. Second, it uses a time-distributed convolutional neural network (CNN) to integrate and map the feature matrices to feature vectors. Third, BreakNet employs a bidirectional long short-term memory (BLSTM) model to analyse the produced set of continuous feature vectors in both the forward and backward directions. Finally, a classification module determines whether a region refers to a deletion. On real long-read sequencing datasets, we demonstrate that BreakNet outperforms Sniffles, SVIM and cuteSV in terms of their F1 scores. The source code for the proposed method is available from GitHub at https://github.com/luojunwei/BreakNet .

CONCLUSIONS

Our work shows that deep learning can be combined with long reads to call deletions more effectively than existing methods.

摘要

背景

结构变异（SV）在人类遗传多样性中占据重要地位，缺失是一种重要的 SV 类型，已被提出与遗传疾病有关。尽管已经提出了基于长读长的各种缺失调用方法，但仍需要一种新的方法来挖掘长读长比对信息中的特征。最近，深度学习在基因组分析中引起了广泛关注，是一种有前途的 SV 调用技术。

结果

在本文中，我们提出了 BreakNet，这是一种使用长读长检测缺失的深度学习方法。BreakNet 首先从长读长比对中提取特征矩阵。其次，它使用时间分布式卷积神经网络（CNN）将特征矩阵集成并映射到特征向量。第三，BreakNet 使用双向长短期记忆（BLSTM）模型来分析正向和反向产生的连续特征向量集。最后，分类模块确定区域是否表示缺失。在真实的长读长测序数据集上，我们证明 BreakNet 在 F1 分数方面优于 Sniffles、SVIM 和 cuteSV。该方法的源代码可在 GitHub 上获得，网址为 https://github.com/luojunwei/BreakNet。

结论

我们的工作表明，深度学习可以与长读长结合，比现有方法更有效地调用缺失。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c72/8641175/33476c1481c1/12859_2021_4499_Fig1_HTML.jpg

相似文献

BreakNet: detecting deletions using long reads and a deep learning approach.

BMC Bioinformatics. 2021 Dec 2;22(1):577. doi: 10.1186/s12859-021-04499-5.

MAMnet: detecting and genotyping deletions and insertions based on long reads and a deep learning approach.

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac195.

LSnet: detecting and genotyping deletions using deep learning network.

Front Genet. 2023 Jun 14;14:1189775. doi: 10.3389/fgene.2023.1189775. eCollection 2023.

Automated filtering of genome-wide large deletions through an ensemble deep learning framework.

Methods. 2022 Oct;206:77-86. doi: 10.1016/j.ymeth.2022.08.001. Epub 2022 Aug 28.

DeepSV: accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural network.

BMC Bioinformatics. 2019 Dec 12;20(1):665. doi: 10.1186/s12859-019-3299-y.

INSnet: a method for detecting insertions based on deep learning network.

BMC Bioinformatics. 2023 Mar 6;24(1):80. doi: 10.1186/s12859-023-05216-0.

SVsearcher: A more accurate structural variation detection method in long read data.

Comput Biol Med. 2023 May;158:106843. doi: 10.1016/j.compbiomed.2023.106843. Epub 2023 Mar 31.

Long-read-based human genomic structural variation detection with cuteSV.

Genome Biol. 2020 Aug 3;21(1):189. doi: 10.1186/s13059-020-02107-y.

CSV-Filter: a deep learning-based comprehensive structural variant filtering method for both short and long reads.

Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae539.

Detecting genomic deletions from high-throughput sequence data with unsupervised learning.

BMC Bioinformatics. 2023 Jan 27;23(Suppl 8):568. doi: 10.1186/s12859-023-05139-w.

引用本文的文献

GKNnet: an relational graph convolutional network-based method with knowledge-augmented activation layer for microbial structural variation detection.

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf200.

A Hitchhiker's Guide to long-read genomic analysis.

Genome Res. 2025 Apr 14;35(4):545-558. doi: 10.1101/gr.279975.124.

DconnLoop: a deep learning model for predicting chromatin loops based on multi-source data integration.

BMC Bioinformatics. 2025 Apr 1;26(1):96. doi: 10.1186/s12859-025-06092-6.

SVEA: an accurate model for structural variation detection using multi-channel image encoding and enhanced AlexNet architecture.

J Transl Med. 2025 Feb 22;23(1):221. doi: 10.1186/s12967-025-06213-y.

HiSVision: A Method for Detecting Large-Scale Structural Variations Based on Hi-C Data and Detection Transformer.

Interdiscip Sci. 2024 Dec 23. doi: 10.1007/s12539-024-00677-0.

GTasm: a genome assembly method using graph transformers and HiFi reads.

Front Genet. 2024 Oct 25;15:1495657. doi: 10.3389/fgene.2024.1495657. eCollection 2024.

Long-read sequencing of extrachromosomal circular DNA and genome assembly of a Solanum lycopersicum breeding line revealed active LTR retrotransposons originating from S. Peruvianum L. introgressions.

BMC Genomics. 2024 Apr 24;25(1):404. doi: 10.1186/s12864-024-10314-1.

LSnet: detecting and genotyping deletions using deep learning network.

Front Genet. 2023 Jun 14;14:1189775. doi: 10.3389/fgene.2023.1189775. eCollection 2023.

A survey of algorithms for the detection of genomic structural variants from long-read sequencing data.

Nat Methods. 2023 Aug;20(8):1143-1158. doi: 10.1038/s41592-023-01932-w. Epub 2023 Jun 29.

cnnLSV: detecting structural variants by encoding long-read alignment information and convolutional neural network.

BMC Bioinformatics. 2023 Mar 28;24(1):119. doi: 10.1186/s12859-023-05243-x.

本文引用的文献

LDICDL: LncRNA-Disease Association Identification Based on Collaborative Deep Learning.

IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1715-1723. doi: 10.1109/TCBB.2020.3034910. Epub 2022 Jun 3.

Long-read-based human genomic structural variation detection with cuteSV.

Genome Biol. 2020 Aug 3;21(1):189. doi: 10.1186/s13059-020-02107-y.

DeepSV: accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural network.

BMC Bioinformatics. 2019 Dec 12;20(1):665. doi: 10.1186/s12859-019-3299-y.

Structural variant calling: the long and the short of it.

Genome Biol. 2019 Nov 20;20(1):246. doi: 10.1186/s13059-019-1828-7.

Multi-platform discovery of haplotype-resolved structural variation in human genomes.

Nat Commun. 2019 Apr 16;10(1):1784. doi: 10.1038/s41467-018-08148-z.

SVIM: structural variant identification using mapped long reads.

Bioinformatics. 2019 Sep 1;35(17):2907-2915. doi: 10.1093/bioinformatics/btz041.

A universal SNP and small-indel variant caller using deep neural networks.

Nat Biotechnol. 2018 Nov;36(10):983-987. doi: 10.1038/nbt.4235. Epub 2018 Sep 24.

Accurate detection of complex structural variations using single-molecule sequencing.

Nat Methods. 2018 Jun;15(6):461-468. doi: 10.1038/s41592-018-0001-7. Epub 2018 Apr 30.

Long-read genome sequencing identifies causal structural variation in a Mendelian disease.

Genet Med. 2018 Jan;20(1):159-163. doi: 10.1038/gim.2017.86. Epub 2017 Jun 22.

An integrated map of structural variation in 2,504 human genomes.

Nature. 2015 Oct 1;526(7571):75-81. doi: 10.1038/nature15394.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

BreakNet：使用长读长和深度学习方法检测缺失

BreakNet: detecting deletions using long reads and a deep learning approach.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献