Ngan Hiu-Lok, Turkina Viktoriia, van Herwerden Denice, Yan Hong, Cai Zongwei, Samanipour Saer
State Key Laboratory of Environmental and Biological Analysis, Department of Chemistry, Hong Kong Baptist University, Kowloon, Hong Kong 999077 P. R. China.
Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1098 XH, The Netherlands.
Anal Chem. 2025 Aug 26;97(33):18028-18035. doi: 10.1021/acs.analchem.5c01873. Epub 2025 Aug 12.
In HRMS-based nontargeted analysis (NTA), spectral matching is crucial for chemical identification, particularly in the absence of retention information. This study introduces class probability of true positives (()) as an innovative approach, leveraging data from MS/MS spectra and calibrant-free predicted retention time indices (RTIs) through 3 machine learning (ML) models to enhance identification probability (IP). The first model is a molecular fingerprint (MF)-to-RTI model trained on 4713 calibrants. The second model, a cumulative neutral loss (CNL)-to-RTI model, utilized 485,577 experimental spectra. The final model, a binary classification model, was trained using 1,686,319 and semisynthetic true negative () spectral matches. High correlations between MF-derived and CNL-derived RTI values ( = 0.96 for training; 0.88 for testing) suggest reduced RTI errors in spectral matches. Incorporating reference spectral library searches and RTI errors, the k-nearest neighbors algorithm achieved a weighted 1 score of 0.65 and a Matthews correlation coefficient of 0.30 for pesticides at concentrations of 1 to 1000 ppb in blank samples, with a recall of 0.60 in black tea matrices. Compared to solely library matching, the average IPs for pesticides increased by 54.5, 52.1, and 46.7% when spiked in blank, 10× diluted, and 100× diluted tea matrices, respectively. This work demonstrates the effectiveness of ML in enhancing the chemical IPs of annotated compounds within complex matrices.
在基于高分辨率质谱的非靶向分析(NTA)中,光谱匹配对于化学物质鉴定至关重要,尤其是在缺乏保留时间信息的情况下。本研究引入真阳性类概率(())作为一种创新方法,通过3种机器学习(ML)模型利用二级质谱(MS/MS)光谱数据和无校准物预测保留时间指数(RTIs)来提高鉴定概率(IP)。第一个模型是在4713种校准物上训练的分子指纹(MF)到RTI模型。第二个模型,即累积中性损失(CNL)到RTI模型,使用了485,577个实验光谱。最后一个模型是二元分类模型,使用1,686,319个和半合成真阴性()光谱匹配进行训练。MF衍生的RTI值与CNL衍生的RTI值之间的高度相关性(训练时 = 0.96;测试时 = 0.88)表明在光谱匹配中RTI误差有所降低。结合参考光谱库搜索和RTI误差,k近邻算法在空白样品中浓度为1至1000 ppb的农药上实现了加权1分数为0.65,马修斯相关系数为0.30,在红茶基质中的召回率为0.60。与仅进行库匹配相比,当添加到空白、10倍稀释和100倍稀释的茶基质中时,农药的平均IP分别增加了54.5%、52.1%和46.7%。这项工作证明了ML在提高复杂基质中注释化合物的化学IP方面的有效性。