Institute for Computational Physics, Universität Stuttgart, Allmandring 3, 70569 Stuttgart, Germany.
J Chem Phys. 2021 Jan 28;154(4):044111. doi: 10.1063/5.0037938.
DNA molecules can electrophoretically be driven through a nanoscale opening in a material, giving rise to rich and measurable ionic current blockades. In this work, we train machine learning models on experimental ionic blockade data from DNA nucleotide translocation through 2D pores of different diameters. The aim of the resulting classification is to enhance the read-out efficiency of the nucleotide identity providing pathways toward error-free sequencing. We propose a novel method that at the same time reduces the current traces to a few physical descriptors and trains low-complexity models, thus reducing the dimensionality of the data. We describe each translocation event by four features including the height of the ionic current blockade. Training on these lower dimensional data and utilizing deep neural networks and convolutional neural networks, we can reach a high accuracy of up to 94% in average. Compared to more complex baseline models trained on the full ionic current traces, our model outperforms. Our findings clearly reveal that the use of the ionic blockade height as a feature together with a proper combination of neural networks, feature extraction, and representation provides a strong enhancement in the detection. Our work points to a possible step toward guiding the experiments to the number of events necessary for sequencing an unknown biopolymer in view of improving the biosensitivity of novel nanopore sequencers.
DNA 分子可以通过材料中的纳米级开口进行电泳驱动,从而产生丰富且可测量的离子电流阻断。在这项工作中,我们根据 DNA 核苷酸通过不同直径的 2D 孔进行转位的实验离子阻断数据,训练机器学习模型。由此产生的分类的目的是提高核苷酸身份的读出效率,为无错误测序提供途径。我们提出了一种新方法,同时将电流迹线简化为几个物理描述符,并训练低复杂度的模型,从而降低数据的维度。我们通过四个特征来描述每个转位事件,包括离子电流阻断的高度。在这些低维数据上进行训练,并利用深度神经网络和卷积神经网络,我们可以达到高达 94%的平均准确率。与基于完整离子电流迹线训练的更复杂基线模型相比,我们的模型表现更好。我们的研究结果清楚地表明,使用离子阻断高度作为特征,以及适当的神经网络、特征提取和表示的组合,可以显著提高检测性能。我们的工作表明,在提高新型纳米孔测序仪的生物灵敏度方面,朝着指导实验达到测序未知生物聚合物所需的事件数量的方向迈出了可能的一步。