用于提高反相液相色谱/高分辨率质谱非靶向工作流程中鉴定概率的机器学习

Machine Learning for Enhanced Identification Probability in RPLC/HRMS Nontargeted Workflows.

作者信息

Ngan Hiu-Lok, Turkina Viktoriia, van Herwerden Denice, Yan Hong, Cai Zongwei, Samanipour Saer

机构信息

State Key Laboratory of Environmental and Biological Analysis, Department of Chemistry, Hong Kong Baptist University, Kowloon, Hong Kong 999077 P. R. China.

Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1098 XH, The Netherlands.

出版信息

Anal Chem. 2025 Aug 26;97(33):18028-18035. doi: 10.1021/acs.analchem.5c01873. Epub 2025 Aug 12.

DOI:10.1021/acs.analchem.5c01873

PMID:40791078

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12392258/

Abstract

In HRMS-based nontargeted analysis (NTA), spectral matching is crucial for chemical identification, particularly in the absence of retention information. This study introduces class probability of true positives (()) as an innovative approach, leveraging data from MS/MS spectra and calibrant-free predicted retention time indices (RTIs) through 3 machine learning (ML) models to enhance identification probability (IP). The first model is a molecular fingerprint (MF)-to-RTI model trained on 4713 calibrants. The second model, a cumulative neutral loss (CNL)-to-RTI model, utilized 485,577 experimental spectra. The final model, a binary classification model, was trained using 1,686,319 and semisynthetic true negative () spectral matches. High correlations between MF-derived and CNL-derived RTI values ( = 0.96 for training; 0.88 for testing) suggest reduced RTI errors in spectral matches. Incorporating reference spectral library searches and RTI errors, the k-nearest neighbors algorithm achieved a weighted 1 score of 0.65 and a Matthews correlation coefficient of 0.30 for pesticides at concentrations of 1 to 1000 ppb in blank samples, with a recall of 0.60 in black tea matrices. Compared to solely library matching, the average IPs for pesticides increased by 54.5, 52.1, and 46.7% when spiked in blank, 10× diluted, and 100× diluted tea matrices, respectively. This work demonstrates the effectiveness of ML in enhancing the chemical IPs of annotated compounds within complex matrices.

摘要

在基于高分辨率质谱的非靶向分析（NTA）中，光谱匹配对于化学物质鉴定至关重要，尤其是在缺乏保留时间信息的情况下。本研究引入真阳性类概率（()）作为一种创新方法，通过3种机器学习（ML）模型利用二级质谱（MS/MS）光谱数据和无校准物预测保留时间指数（RTIs）来提高鉴定概率（IP）。第一个模型是在4713种校准物上训练的分子指纹（MF）到RTI模型。第二个模型，即累积中性损失（CNL）到RTI模型，使用了485,577个实验光谱。最后一个模型是二元分类模型，使用1,686,319个和半合成真阴性（）光谱匹配进行训练。MF衍生的RTI值与CNL衍生的RTI值之间的高度相关性（训练时 = 0.96；测试时 = 0.88）表明在光谱匹配中RTI误差有所降低。结合参考光谱库搜索和RTI误差，k近邻算法在空白样品中浓度为1至1000 ppb的农药上实现了加权1分数为0.65，马修斯相关系数为0.30，在红茶基质中的召回率为0.60。与仅进行库匹配相比，当添加到空白、10倍稀释和100倍稀释的茶基质中时，农药的平均IP分别增加了54.5%、52.1%和46.7%。这项工作证明了ML在提高复杂基质中注释化合物的化学IP方面的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2031/12392258/f4da261c937e/ac5c01873_0001.jpg

相似文献

Machine Learning for Enhanced Identification Probability in RPLC/HRMS Nontargeted Workflows.

Anal Chem. 2025 Aug 26;97(33):18028-18035. doi: 10.1021/acs.analchem.5c01873. Epub 2025 Aug 12.

Prescription of Controlled Substances: Benefits and Risks

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

The effect of sample site and collection procedure on identification of SARS-CoV-2 infection.

Cochrane Database Syst Rev. 2024 Dec 16;12(12):CD014780. doi: 10.1002/14651858.CD014780.

Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.

Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.

Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.

J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.

Development of Machine Learning-based Algorithms to Predict the 2- and 5-year Risk of TKA After Tibial Plateau Fracture Treatment.

Clin Orthop Relat Res. 2025 Mar 12. doi: 10.1097/CORR.0000000000003442.

Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.

Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.

Machine Learning-Based Identification of Petroleum Distillates and Gasoline Traces Using Measured and Synthetic GC Spectra from Collected Samples.

Mol Inform. 2025 Aug;44(8):e202400371. doi: 10.1002/minf.70008.

本文引用的文献

Introducing "Identification Probability" for Automated and Transferable Assessment of Metabolite Identification Confidence in Metabolomics and Related Studies.

Anal Chem. 2025 Jan 14;97(1):1-11. doi: 10.1021/acs.analchem.4c04060. Epub 2024 Dec 19.

Exploring the Chemical Space of the Exposome: How Far Have We Gone?

JACS Au. 2024 Jun 20;4(7):2412-2425. doi: 10.1021/jacsau.4c00220. eCollection 2024 Jul 22.

Exploring the chemical subspace of RPLC: A data driven approach.

Anal Chim Acta. 2024 Aug 15;1317:342869. doi: 10.1016/j.aca.2024.342869. Epub 2024 Jun 20.

Misconception of model transferability precludes estimates of seagrass community reorganization in a changing climate.

Nat Plants. 2024 Jul;10(7):1071-1074. doi: 10.1038/s41477-024-01735-7. Epub 2024 Jul 1.

Enhancing compound confidence in suspect and non-target screening through machine learning-based retention time prediction.

Environ Pollut. 2024 Apr 15;347:123763. doi: 10.1016/j.envpol.2024.123763. Epub 2024 Mar 14.

The underappreciated diversity of bile acid modifications.

Cell. 2024 Mar 28;187(7):1801-1818.e20. doi: 10.1016/j.cell.2024.02.019. Epub 2024 Mar 11.

Metabolomic machine learning predictor for diagnosis and prognosis of gastric cancer.

Nat Commun. 2024 Feb 23;15(1):1657. doi: 10.1038/s41467-024-46043-y.

RepoRT: a comprehensive repository for small molecule retention times.

Nat Methods. 2024 Feb;21(2):153-155. doi: 10.1038/s41592-023-02143-z.

Critical Assessment of the Chemical Space Covered by LC-HRMS Non-Targeted Analysis.

Environ Sci Technol. 2023 Sep 26;57(38):14101-14112. doi: 10.1021/acs.est.3c03606. Epub 2023 Sep 13.

Cumulative Neutral Loss Model for Fragment Deconvolution in Electrospray Ionization High-Resolution Mass Spectrometry Data.

Anal Chem. 2023 Aug 22;95(33):12247-12255. doi: 10.1021/acs.analchem.3c00896. Epub 2023 Aug 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

用于提高反相液相色谱/高分辨率质谱非靶向工作流程中鉴定概率的机器学习

Machine Learning for Enhanced Identification Probability in RPLC/HRMS Nontargeted Workflows.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr超能文献

用于提高反相液相色谱/高分辨率质谱非靶向工作流程中鉴定概率的机器学习

Machine Learning for Enhanced Identification Probability in RPLC/HRMS Nontargeted Workflows.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr
超能文献