Department of Chemistry, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada.
Analyst. 2021 Jul 26;146(15):4737-4743. doi: 10.1039/d1an00557j.
Although there has been a surge in popularity of differential mobility spectrometry (DMS) within analytical workflows, determining separation conditions within the DMS parameter space still requires manual optimization. A means of accurately predicting differential ion mobility would benefit practitioners by significantly reducing the time associated with method development. Here, we report a machine learning (ML) approach that predicts dispersion curves in an N2 environment, which are the compensation voltages (CVs) required for optimal ion transmission across a range of separation voltages (SVs) between 1500 to 4000 V. After training a random-forest based model using the DMS information of 409 cationic analytes, dispersion curves were reproduced with a mean absolute error (MAE) of ≤ 2.4 V, approaching typical experimental peak FWHMs of ±1.5 V. The predictive ML model was trained using only m/z and ion-neutral collision cross section (CCS) as inputs, both of which can be obtained from experimental databases before being extensively validated. By updating the model via inclusion of two CV datapoints at lower SVs (1500 V and 2000 V) accuracy was further improved to MAE ≤ 1.2 V. This improvement stems from the ability of the "guided" ML routine to accurately capture Type A and B behaviour, which was exhibited by only 2% and 17% of ions, respectively, within the dataset. Dispersion curve predictions of the database's most common Type C ions (81%) using the unguided and guided approaches exhibited average errors of 0.6 V and 0.1 V, respectively.
尽管差分离子迁移谱(DMS)在分析工作流程中的应用越来越广泛,但在 DMS 参数空间中确定分离条件仍需要手动优化。一种能够准确预测差分离子迁移率的方法将通过显著减少与方法开发相关的时间,使从业者受益。在这里,我们报告了一种机器学习(ML)方法,该方法可以预测 N2 环境中的色散曲线,这是在 1500 至 4000 V 之间的一系列分离电压(SV)下优化离子传输所需的补偿电压(CV)。在使用 409 种阳离子分析物的 DMS 信息对基于随机森林的模型进行训练后,色散曲线的再现误差(MAE)≤2.4 V,接近典型实验峰 FWHM 的±1.5 V。该预测 ML 模型仅使用 m/z 和离子-中性碰撞截面(CCS)作为输入进行训练,这两者都可以从实验数据库中获得,然后进行广泛验证。通过在较低 SV(1500 V 和 2000 V)下包含两个 CV 数据点来更新模型,进一步提高了准确性,MAE≤1.2 V。这种改进源于“引导”ML 例程准确捕获 A 型和 B 型行为的能力,分别仅在数据集内的 2%和 17%的离子中表现出这种行为。使用无引导和引导方法对数据库中最常见的 C 型离子(81%)的色散曲线预测,平均误差分别为 0.6 V 和 0.1 V。