从质谱数据预测结构未知化学品的反相液相色谱保留指数

Predicting RP-LC retention indices of structurally unknown chemicals from mass spectrometry data.

作者信息

Boelrijk Jim, van Herwerden Denice, Ensing Bernd, Forré Patrick, Samanipour Saer

机构信息

AI4Science Lab, University of Amsterdam, Amsterdam, The Netherlands.

Institute for Informatics, University of Amsterdam, Amsterdam, The Netherlands.

出版信息

J Cheminform. 2023 Feb 24;15(1):28. doi: 10.1186/s13321-023-00699-8.

DOI:10.1186/s13321-023-00699-8

PMID:36829215

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9960388/

Abstract

Non-target analysis combined with liquid chromatography high resolution mass spectrometry is considered one of the most comprehensive strategies for the detection and identification of known and unknown chemicals in complex samples. However, many compounds remain unidentified due to data complexity and limited number structures in chemical databases. In this work, we have developed and validated a novel machine learning algorithm to predict the retention index (r[Formula: see text]) values for structurally (un)known chemicals based on their measured fragmentation pattern. The developed model, for the first time, enabled the predication of r[Formula: see text] values without the need for the exact structure of the chemicals, with an [Formula: see text] of 0.91 and 0.77 and root mean squared error (RMSE) of 47 and 67 r[Formula: see text] units for the NORMAN ([Formula: see text]) and amide ([Formula: see text]) test sets, respectively. This fragment based model showed comparable accuracy in r[Formula: see text] prediction compared to conventional descriptor-based models that rely on known chemical structure, which obtained an [Formula: see text] of 0.85 with an RMSE of 67.

摘要

非靶向分析结合液相色谱高分辨率质谱被认为是检测和鉴定复杂样品中已知和未知化学物质最全面的策略之一。然而，由于数据复杂性和化学数据库中结构数量有限，许多化合物仍未得到鉴定。在这项工作中，我们开发并验证了一种新型机器学习算法，用于根据结构（未）知化学物质的测量碎片模式预测其保留指数（r[公式：见正文]）值。所开发的模型首次实现了无需化学物质的精确结构即可预测r[公式：见正文]值，对于诺曼（[公式：见正文]）和酰胺（[公式：见正文]）测试集，其决定系数（[公式：见正文]）分别为0.91和0.77，均方根误差（RMSE）分别为47和67个r[公式：见正文]单位。与依赖已知化学结构的传统基于描述符的模型相比，这种基于碎片的模型在r[公式：见正文]预测中显示出相当的准确性，传统模型的决定系数（[公式：见正文]）为0.85，均方根误差（RMSE）为67。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76f4/9960388/c853de8e870f/13321_2023_699_Fig1_HTML.jpg

相似文献

Predicting RP-LC retention indices of structurally unknown chemicals from mass spectrometry data.

J Cheminform. 2023 Feb 24;15(1):28. doi: 10.1186/s13321-023-00699-8.

Assessing predictive performance of supervised machine learning algorithms for a diamond pricing model.

Sci Rep. 2023 Oct 12;13(1):17315. doi: 10.1038/s41598-023-44326-w.

A sparse QSRR model for predicting retention indices of essential oils based on robust screening approach.

SAR QSAR Environ Res. 2017 Aug;28(8):691-703. doi: 10.1080/1062936X.2017.1375010.

A comparative study of black-box and white-box data-driven methods to predict landfill leachate permeability.

Environ Monit Assess. 2023 Jun 19;195(7):862. doi: 10.1007/s10661-023-11462-9.

Predicting anti-trypanosome effect of carbazole-derived compounds by powerful SVM with novel kernel function and comprehensive learning PSO.

Antimicrob Agents Chemother. 2024 Jul 9;68(7):e0026524. doi: 10.1128/aac.00265-24. Epub 2024 May 29.

MS2Tox Machine Learning Tool for Predicting the Ecotoxicity of Unidentified Chemicals in Water by Nontarget LC-HRMS.

Environ Sci Technol. 2022 Nov 15;56(22):15508-15517. doi: 10.1021/acs.est.2c02536. Epub 2022 Oct 21.

Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS Data.

Anal Chem. 2023 Aug 22;95(33):12329-12338. doi: 10.1021/acs.analchem.3c01744. Epub 2023 Aug 7.

Identifying prostate cancer and its clinical risk in asymptomatic men using machine learning of high dimensional peripheral blood flow cytometric natural killer cell subset phenotyping data.

Elife. 2020 Jul 28;9:e50936. doi: 10.7554/eLife.50936.

Global peak alignment for comprehensive two-dimensional gas chromatography mass spectrometry using point matching algorithms.

J Bioinform Comput Biol. 2016 Dec;14(6):1650032. doi: 10.1142/S0219720016500323. Epub 2016 Sep 9.

Adaptiveness of RGB-image derived algorithms in the measurement of fractional vegetation coverage.

BMC Bioinformatics. 2022 Aug 30;23(1):358. doi: 10.1186/s12859-022-04886-6.

引用本文的文献

Large-scale generation of in silico based spectral libraries to annotate dark chemical space features in non-target analysis.

Anal Bioanal Chem. 2025 Sep 2. doi: 10.1007/s00216-025-06034-4.

Machine Learning for Enhanced Identification Probability in RPLC/HRMS Nontargeted Workflows.

Anal Chem. 2025 Aug 26;97(33):18028-18035. doi: 10.1021/acs.analchem.5c01873. Epub 2025 Aug 12.

Do experimental projection methods outcompete retention time prediction models in non-target screening? A case study on LC/HRMS interlaboratory comparison data.

Analyst. 2025 Jul 16. doi: 10.1039/d5an00323g.

ROASMI: accelerating small molecule identification by repurposing retention data.

J Cheminform. 2025 Feb 14;17(1):20. doi: 10.1186/s13321-025-00968-8.

Physicochemical modelling of the retention mechanism of temperature-responsive polymeric columns for HPLC through machine learning algorithms.

J Cheminform. 2024 Jun 21;16(1):72. doi: 10.1186/s13321-024-00873-6.

Non-target screening in water analysis: recent trends of data evaluation, quality assurance, and their future perspectives.

Anal Bioanal Chem. 2024 Apr;416(9):2125-2136. doi: 10.1007/s00216-024-05153-8. Epub 2024 Feb 1.

Cumulative Neutral Loss Model for Fragment Deconvolution in Electrospray Ionization High-Resolution Mass Spectrometry Data.

Anal Chem. 2023 Aug 22;95(33):12247-12255. doi: 10.1021/acs.analchem.3c00896. Epub 2023 Aug 7.

Collision Cross Section Prediction with Molecular Fingerprint Using Machine Learning.

Molecules. 2022 Sep 29;27(19):6424. doi: 10.3390/molecules27196424.

本文引用的文献

From Centroided to Profile Mode: Machine Learning for Prediction of Peak Width in HRMS Data.

Anal Chem. 2021 Dec 14;93(49):16562-16570. doi: 10.1021/acs.analchem.1c03755. Epub 2021 Nov 29.

Inter-laboratory mass spectrometry dataset based on passive sampling of drinking water for non-target analysis.

Sci Data. 2021 Aug 24;8(1):223. doi: 10.1038/s41597-021-01002-w.

Development and Application of Liquid Chromatographic Retention Time Indices in HRMS-Based Suspect and Nontarget Screening.

Anal Chem. 2021 Aug 24;93(33):11601-11611. doi: 10.1021/acs.analchem.1c02348. Epub 2021 Aug 12.

DeepReI: Deep learning-based gas chromatographic retention index predictor.

Anal Chim Acta. 2021 Feb 22;1147:64-71. doi: 10.1016/j.aca.2020.12.043. Epub 2020 Dec 29.

CatBoost for big data: an interdisciplinary review.

J Big Data. 2020;7(1):94. doi: 10.1186/s40537-020-00369-8. Epub 2020 Nov 4.

Recent applications of retention modelling in liquid chromatography.

J Sep Sci. 2021 Jan;44(1):88-114. doi: 10.1002/jssc.202000905. Epub 2020 Nov 3.

The exposome and health: Where chemistry meets biology.

Science. 2020 Jan 24;367(6476):392-396. doi: 10.1126/science.aay3164.

Tracking complex mixtures of chemicals in our changing environment.

Science. 2020 Jan 24;367(6476):388-392. doi: 10.1126/science.aay6636.

Comparison of Software Tools for Liquid Chromatography-High-Resolution Mass Spectrometry Data Processing in Nontarget Screening of Environmental Samples.

Anal Chem. 2020 Jan 21;92(2):1898-1907. doi: 10.1021/acs.analchem.9b04095. Epub 2019 Dec 27.

Implementation of Chemometric Tools To Improve Data Mining and Prioritization in LC-HRMS for Nontarget Screening of Organic Micropollutants in Complex Water Matrixes.

Anal Chem. 2019 Jul 16;91(14):9213-9220. doi: 10.1021/acs.analchem.9b01984. Epub 2019 Jul 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从质谱数据预测结构未知化学品的反相液相色谱保留指数

Predicting RP-LC retention indices of structurally unknown chemicals from mass spectrometry data.

作者信息

Boelrijk Jim, van Herwerden Denice, Ensing Bernd, Forré Patrick, Samanipour Saer

机构信息

AI4Science Lab, University of Amsterdam, Amsterdam, The Netherlands.

Institute for Informatics, University of Amsterdam, Amsterdam, The Netherlands.

出版信息

J Cheminform. 2023 Feb 24;15(1):28. doi: 10.1186/s13321-023-00699-8.

DOI:10.1186/s13321-023-00699-8

PMID:36829215

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9960388/

Abstract

摘要

从质谱数据预测结构未知化学品的反相液相色谱保留指数

Predicting RP-LC retention indices of structurally unknown chemicals from mass spectrometry data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

从质谱数据预测结构未知化学品的反相液相色谱保留指数

Predicting RP-LC retention indices of structurally unknown chemicals from mass spectrometry data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献