School of Software, Henan Polytechnic University, Jiaozuo, 454003, China.
BMC Bioinformatics. 2023 Mar 6;24(1):80. doi: 10.1186/s12859-023-05216-0.
Many studies have shown that structural variations (SVs) strongly impact human disease. As a common type of SV, insertions are usually associated with genetic diseases. Therefore, accurately detecting insertions is of great significance. Although many methods for detecting insertions have been proposed, these methods often generate some errors and miss some variants. Hence, accurately detecting insertions remains a challenging task.
In this paper, we propose a method named INSnet to detect insertions using a deep learning network. First, INSnet divides the reference genome into continuous sub-regions and takes five features for each locus through alignments between long reads and the reference genome. Next, INSnet uses a depthwise separable convolutional network. The convolution operation extracts informative features through spatial information and channel information. INSnet uses two attention mechanisms, the convolutional block attention module (CBAM) and efficient channel attention (ECA) to extract key alignment features in each sub-region. In order to capture the relationship between adjacent subregions, INSnet uses a gated recurrent unit (GRU) network to further extract more important SV signatures. After predicting whether a sub-region contains an insertion through the previous steps, INSnet determines the precise site and length of the insertion. The source code is available from GitHub at https://github.com/eioyuou/INSnet .
Experimental results show that INSnet can achieve better performance than other methods in terms of F1 score on real datasets.
许多研究表明,结构变异(SV)强烈影响人类疾病。作为一种常见的 SV 类型,插入通常与遗传疾病有关。因此,准确检测插入非常重要。尽管已经提出了许多用于检测插入的方法,但这些方法通常会产生一些错误并错过一些变体。因此,准确检测插入仍然是一项具有挑战性的任务。
在本文中,我们提出了一种名为 INSnet 的方法,用于使用深度学习网络检测插入。首先,INSnet 将参考基因组划分为连续的子区域,并通过长读段与参考基因组之间的比对为每个基因座提取五个特征。接下来,INSnet 使用深度可分离卷积网络。卷积操作通过空间信息和通道信息提取有信息量的特征。INSnet 使用两种注意力机制,卷积块注意力模块(CBAM)和高效通道注意力(ECA),从每个子区域中提取关键对齐特征。为了捕获相邻子区域之间的关系,INSnet 使用门控循环单元(GRU)网络进一步提取更重要的 SV 特征。在通过前面的步骤预测一个子区域是否包含插入之后,INSnet确定插入的精确位置和长度。源代码可在 https://github.com/eioyuou/INSnet 上从 GitHub 获得。
实验结果表明,INSnet 在真实数据集上的 F1 分数方面可以比其他方法取得更好的性能。