BreakNet:使用长读长和深度学习方法检测缺失

BreakNet: detecting deletions using long reads and a deep learning approach.

机构信息

College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454003, China.

School of Computer Science and Information Engineering, Henan University, Kaifeng, 475001, China.

出版信息

BMC Bioinformatics. 2021 Dec 2;22(1):577. doi: 10.1186/s12859-021-04499-5.

Abstract

BACKGROUND

Structural variations (SVs) occupy a prominent position in human genetic diversity, and deletions form an important type of SV that has been suggested to be associated with genetic diseases. Although various deletion calling methods based on long reads have been proposed, a new approach is still needed to mine features in long-read alignment information. Recently, deep learning has attracted much attention in genome analysis, and it is a promising technique for calling SVs.

RESULTS

In this paper, we propose BreakNet, a deep learning method that detects deletions by using long reads. BreakNet first extracts feature matrices from long-read alignments. Second, it uses a time-distributed convolutional neural network (CNN) to integrate and map the feature matrices to feature vectors. Third, BreakNet employs a bidirectional long short-term memory (BLSTM) model to analyse the produced set of continuous feature vectors in both the forward and backward directions. Finally, a classification module determines whether a region refers to a deletion. On real long-read sequencing datasets, we demonstrate that BreakNet outperforms Sniffles, SVIM and cuteSV in terms of their F1 scores. The source code for the proposed method is available from GitHub at https://github.com/luojunwei/BreakNet .

CONCLUSIONS

Our work shows that deep learning can be combined with long reads to call deletions more effectively than existing methods.

摘要

背景

结构变异(SV)在人类遗传多样性中占据重要地位,缺失是一种重要的 SV 类型,已被提出与遗传疾病有关。尽管已经提出了基于长读长的各种缺失调用方法,但仍需要一种新的方法来挖掘长读长比对信息中的特征。最近,深度学习在基因组分析中引起了广泛关注,是一种有前途的 SV 调用技术。

结果

在本文中,我们提出了 BreakNet,这是一种使用长读长检测缺失的深度学习方法。BreakNet 首先从长读长比对中提取特征矩阵。其次,它使用时间分布式卷积神经网络(CNN)将特征矩阵集成并映射到特征向量。第三,BreakNet 使用双向长短期记忆(BLSTM)模型来分析正向和反向产生的连续特征向量集。最后,分类模块确定区域是否表示缺失。在真实的长读长测序数据集上,我们证明 BreakNet 在 F1 分数方面优于 Sniffles、SVIM 和 cuteSV。该方法的源代码可在 GitHub 上获得,网址为 https://github.com/luojunwei/BreakNet。

结论

我们的工作表明,深度学习可以与长读长结合,比现有方法更有效地调用缺失。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c72/8641175/33476c1481c1/12859_2021_4499_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索