Department of Computer Science and Engineering, University of Michigan, Ann Arbor, 48109, MI, USA.
Department of Electrical and Computer Engineering, University of Michigan, Ann Arbor, 48109, MI, USA.
Genome Biol. 2021 Oct 27;22(1):298. doi: 10.1186/s13059-021-02511-y.
We present SquiggleNet, the first deep-learning model that can classify nanopore reads directly from their electrical signals. SquiggleNet operates faster than DNA passes through the pore, allowing real-time classification and read ejection. Using 1 s of sequencing data, the classifier achieves significantly higher accuracy than base calling followed by sequence alignment. Our approach is also faster and requires an order of magnitude less memory than alignment-based approaches. SquiggleNet distinguished human from bacterial DNA with over 90% accuracy, generalized to unseen bacterial species in a human respiratory meta genome sample, and accurately classified sequences containing human long interspersed repeat elements.
我们提出了 SquiggleNet,这是第一个能够直接从电信号对纳米孔读取进行分类的深度学习模型。SquiggleNet 的运行速度比 DNA 通过孔的速度还快,能够实现实时分类和读取剔除。使用 1 秒的测序数据,分类器的准确性明显高于碱基调用后进行序列比对的方法。与基于比对的方法相比,我们的方法速度更快,所需的内存也少一个数量级。SquiggleNet 能够以超过 90%的准确率区分人类和细菌 DNA,在人类呼吸道宏基因组样本中对未见的细菌物种进行泛化,并且能够准确地对包含人类长散布重复元件的序列进行分类。