Fine Jonathan, Kuan-Yu Liu Judy, Beck Armen, Alzarieni Kawthar Z, Ma Xin, Boulos Victoria M, Kenttämaa Hilkka I, Chopra Gaurav
Department of Chemistry, Purdue University 560 Oval Drive West Lafayette IN USA
Purdue Institute for Drug Discovery, Integrative Data Science Institute, Purdue Center for Cancer Research, Purdue Institute for Inflammation, Immunology and Infectious Disease, Purdue Institute for Integrative Neuroscience West Lafayette IN USA.
Chem Sci. 2020 Oct 5;11(43):11849-11858. doi: 10.1039/d0sc02530e.
Diagnostic ion-molecule reactions employed in tandem mass spectrometry experiments can frequently be used to differentiate between isomeric compounds unlike the popular collision-activated dissociation methodology. Selected neutral reagents, such as 2-methoxypropene (MOP), are introduced into an ion trap mass spectrometer where they react with protonated analytes to yield product ions that are diagnostic for the functional groups present in the analytes. However, the understanding and interpretation of the mass spectra obtained can be challenging and time-consuming. Here, we introduce the first bootstrapped decision tree model trained on 36 known ion-molecule reactions with MOP. It uses the graph-based connectivity of analytes' functional groups as input to predict whether the protonated analyte will undergo a diagnostic reaction with MOP. A Cohen kappa statistic of 0.70 was achieved with a blind test set, suggesting substantial inter-model reliability on limited training data. Prospective diagnostic product predictions were experimentally tested for 13 previously unpublished analytes. We introduce chemical reactivity flowcharts to facilitate chemical interpretation of the decisions made by the machine learning method that will be useful to understand and interpret the mass spectra for chemical reactivity.
与常用的碰撞激活解离方法不同,串联质谱实验中使用的诊断性离子-分子反应常常可用于区分同分异构体化合物。选定的中性试剂,如2-甲氧基丙烯(MOP),被引入离子阱质谱仪中,在那里它们与质子化分析物反应,生成对分析物中存在的官能团具有诊断性的产物离子。然而,对所得质谱的理解和解释可能具有挑战性且耗时。在此,我们介绍了第一个基于36个与MOP的已知离子-分子反应训练的自训练决策树模型。它将分析物官能团基于图的连通性作为输入,以预测质子化分析物是否会与MOP发生诊断性反应。在一个盲测集上实现了0.70的科恩kappa统计量,表明在有限的训练数据上模型间具有较高的可靠性。对13种先前未发表的分析物进行了前瞻性诊断产物预测的实验测试。我们引入化学反应流程图,以促进对机器学习方法所做决策的化学解释,这将有助于理解和解释化学反应性的质谱。