Suppr超能文献

悠然:利用具有单调注意力的编解码器模型的精确碱基调用器。

Halcyon: an accurate basecaller exploiting an encoder-decoder model with monotonic attention.

机构信息

Health Intelligence Center.

Human Genome Center.

出版信息

Bioinformatics. 2021 Jun 9;37(9):1211-1217. doi: 10.1093/bioinformatics/btaa953.

Abstract

MOTIVATION

In recent years, nanopore sequencing technology has enabled inexpensive long-read sequencing, which promises reads longer than a few thousand bases. Such long-read sequences contribute to the precise detection of structural variations and accurate haplotype phasing. However, deciphering precise DNA sequences from noisy and complicated nanopore raw signals remains a crucial demand for downstream analyses based on higher-quality nanopore sequencing, although various basecallers have been introduced to date.

RESULTS

To address this need, we developed a novel basecaller, Halcyon, that incorporates neural-network techniques frequently used in the field of machine translation. Our model employs monotonic-attention mechanisms to learn semantic correspondences between nucleotides and signal levels without any pre-segmentation against input signals. We evaluated performance with a human whole-genome sequencing dataset and demonstrated that Halcyon outperformed existing third-party basecallers and achieved competitive performance against the latest Oxford Nanopore Technologies' basecallers.

AVAILABILITYAND IMPLEMENTATION

The source code (halcyon) can be found at https://github.com/relastle/halcyon.

摘要

动机

近年来,纳米孔测序技术实现了廉价的长读测序,有望获得数千个碱基以上的长读序列。这些长读序列有助于精确检测结构变异和准确的单倍型相位。然而,从嘈杂和复杂的纳米孔原始信号中破译精确的 DNA 序列仍然是基于高质量纳米孔测序的下游分析的关键需求,尽管迄今为止已经引入了各种碱基调用器。

结果

为了满足这一需求,我们开发了一种新的碱基调用器 Halcyon,它结合了机器翻译领域常用的神经网络技术。我们的模型采用单调注意机制,在不针对输入信号进行任何预分段的情况下,学习核苷酸和信号电平之间的语义对应关系。我们使用人类全基因组测序数据集评估了性能,结果表明 Halcyon 优于现有的第三方碱基调用器,并与最新的牛津纳米孔技术碱基调用器具有竞争力。

可用性和实现

源代码(halcyon)可在 https://github.com/relastle/halcyon 找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/987e/8189681/0bdd5283ed1c/btaa953f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验