School of Information and Engineering , East China University of Science and Technology , Shanghai 200237 , People's Republic of China.
School of Chemistry and Molecule Engineering , East China University of Science and Technology , Shanghai 200237 , People's Republic of China.
Anal Chem. 2019 Aug 6;91(15):10033-10039. doi: 10.1021/acs.analchem.9b01896. Epub 2019 Jul 18.
The nanopore technique employs a nanoscale cavity to electrochemically confine individual molecules, achieving ultrasensitive single-molecule analysis based on evaluating the amplitude and duration of the ionic current. However, each nanopore sensing interface has its own intrinsic sensing ability, which does not always efficiently generate distinctive blockade currents for multiple analytes. Therefore, analytes that differ at only a single site often exhibit similar blockade currents or durations in nanopore experiments, which often produces serious overlap in the resulting statistical graphs. To improve the sensing ability of nanopores, herein we propose a novel shapelet-based machine learning approach to discriminate mixed analytes that exhibit nearly identical blockade current amplitudes and durations. DNA oligomers with a single-nucleotide difference, 5'-AAAA-3' and 5'-GAAA-3', are employed as model analytes that are difficult to identify in aerolysin nanopores at 100 mV. First, a set of the most informative and discriminative segments are learned from the time-series data set of blockade current signals using the learning time-series shapelets (LTS) algorithm. Then, the shapelet-transformed representation of the signals is obtained by calculating the minimum distance between the shapelets and the original signals. A simple logistic classifier is used to identify the two types of DNA oligomers in accordance with the corresponding shapelet-transformed representation. Finally, an evaluation is performed on the validation data set to show that our approach can achieve a high score of 0.933. In comparison with the conventional statistical methods for the analysis of duration and residual current, the shapelet-transformed representation provides clearly discriminated distributions for multiple analytes. Taking advantage of the robust LTS algorithm, one could anticipate the real-time analysis of nanopore events for the direct identification and quantification of multiple biomolecules in a complex real sample (e.g., serum) without labels and time-consuming mutagenesis.
纳米孔技术采用纳米级腔室实现对单个分子的电化学限制,基于评估离子电流的幅度和持续时间实现超灵敏的单分子分析。然而,每个纳米孔传感界面都具有其自身的固有传感能力,并不总是能够为多种分析物有效地产生独特的阻断电流。因此,仅在一个位置上有所不同的分析物在纳米孔实验中通常会表现出相似的阻断电流或持续时间,这往往会导致统计图形中严重重叠。为了提高纳米孔的传感能力,本文提出了一种基于形状的机器学习方法来区分具有相似阻断电流幅度和持续时间的混合分析物。使用具有单个核苷酸差异的 DNA 寡核苷酸 5'-AAAA-3' 和 5'-GAAA-3' 作为模型分析物,它们在 100 mV 的 aerolysin 纳米孔中难以识别。首先,使用学习时间序列形状(LTS)算法从阻断电流信号的时间序列数据集中学到一组最具信息量和区分力的片段。然后,通过计算形状与原始信号之间的最小距离来获得信号的形状转换表示。使用简单的逻辑回归分类器根据相应的形状转换表示来识别两种类型的 DNA 寡核苷酸。最后,在验证数据集上进行评估,表明我们的方法可以实现高达 0.933 的高分。与用于分析持续时间和剩余电流的传统统计方法相比,形状转换表示为多种分析物提供了明显区分的分布。利用强大的 LTS 算法,人们可以预期纳米孔事件的实时分析,无需标记和耗时的诱变,直接识别和定量复杂实际样品(例如血清)中的多种生物分子。