Faculty of Medicine and Edmond J Safra Center for Bioinformatics, Tel Aviv University, Tel Aviv 69978, Israel.
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac251.
Nanopore sequencing is an emerging technology that reads DNA by utilizing a unique method of detecting nucleic acid sequences and identifies the various chemical modifications they carry. Deep learning has increased in popularity as a useful technique to solve many complex computational tasks. 'Adaptive sequencing' is an implementation of selective sequencing, intended for use on the nanopore sequencing platform. In this study, we demonstrated an alternative method of software-based selective sequencing that is performed in real time by combining nanopore sequencing and deep learning. Our results showed the feasibility of using deep learning for classifying signals from only the first 200 nucleotides in a raw nanopore sequencing signal format. This was further demonstrated by comparing the accuracy of our deep learning classification model across data from several human cell lines and other eukaryotic organisms. We used custom deep learning models and a script that utilizes a 'Read Until' framework to target mitochondrial molecules in real time from a human cell line sample. This achieved a significant separation and enrichment ability of 2.3-fold. In a series of very short sequencing experiments (10, 30 and 120 min), we identified genomic and mitochondrial reads with accuracy above 90%, although mitochondrial DNA comprised only 0.1% of the total input material. The uniqueness of our method is the ability to distinguish two groups of DNA even without a labeled reference. This contrasts with studies that required a well-defined reference, whether of a DNA sequence or of another type of representation. Additionally, our method showed higher correlation to the theoretically possible enrichment factor, compared with other published methods. We believe that our results will lay the foundation for rapid and selective sequencing using nanopore technology and will pave the approach for clinical applications that use nanopore sequencing data.
纳米孔测序是一种新兴的技术,它通过利用一种独特的检测核酸序列的方法来读取 DNA,并识别它们所携带的各种化学修饰。深度学习作为一种解决许多复杂计算任务的有用技术,已经越来越受欢迎。“自适应测序”是一种选择性测序的实现,旨在用于纳米孔测序平台。在这项研究中,我们展示了一种替代的基于软件的选择性测序方法,该方法通过将纳米孔测序和深度学习相结合,实时进行。我们的结果表明,使用深度学习对原始纳米孔测序信号格式中仅前 200 个核苷酸的信号进行分类是可行的。我们通过比较几个人类细胞系和其他真核生物的数据,证明了我们的深度学习分类模型的准确性。我们使用了自定义的深度学习模型和一个利用“Read Until”框架的脚本,从人类细胞系样本中实时靶向线粒体分子。这实现了 2.3 倍的显著分离和富集能力。在一系列非常短的测序实验(10、30 和 120 分钟)中,我们以高于 90%的准确性识别了基因组和线粒体读数,尽管线粒体 DNA 仅占总输入材料的 0.1%。我们方法的独特之处在于即使没有标记的参考,也能够区分两组 DNA。这与需要明确参考的研究形成了对比,无论是 DNA 序列还是其他类型的表示。此外,与其他已发表的方法相比,我们的方法与理论上可能的富集因子相关性更高。我们相信,我们的研究结果将为纳米孔技术的快速和选择性测序奠定基础,并为使用纳米孔测序数据的临床应用铺平道路。