Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD 4072, Australia.
Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.
Gigascience. 2018 May 1;7(5). doi: 10.1093/gigascience/giy037.
Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology that offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report Chiron, the first deep learning model to achieve end-to-end basecalling and directly translate the raw signal to DNA sequence without the error-prone segmentation step. Trained with only a small set of 4,000 reads, we show that our model provides state-of-the-art basecalling accuracy, even on previously unseen species. Chiron achieves basecalling speeds of more than 2,000 bases per second using desktop computer graphics processing units.
通过将 DNA 片段转移穿过纳米孔阵列进行测序是一项快速成熟的技术,它提供了比其他方法更快、更便宜的测序。然而,从嘈杂和复杂的电信号中准确破译 DNA 序列具有挑战性。在这里,我们报告了 Chiron,这是第一个实现端到端碱基调用的深度学习模型,它直接将原始信号转换为 DNA 序列,而无需易错的分段步骤。仅使用一小部分 4000 个读数进行训练,我们表明我们的模型即使在以前未见的物种上也能提供最先进的碱基调用准确性。Chiron 使用台式计算机图形处理单元实现了超过 2000 个碱基/秒的碱基调用速度。