三种液相色谱（LC）保留时间预测模型的比较。

A comparison of three liquid chromatography (LC) retention time prediction models.

作者信息

McEachran Andrew D, Mansouri Kamel, Newton Seth R, Beverly Brandiese E J, Sobus Jon R, Williams Antony J

机构信息

Oak Ridge Institute for Science and Education (ORISE) Research Participation Program, US Environmental Protection Agency, 109 T.W. Alexander Drive, Research Triangle Park, NC 27711, USA; National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency, 109 T.W. Alexander Drive, Research Triangle Park, NC 27711, USA.

National Exposure Research Laboratory, Office of Research and Development, US Environmental Protection Agency, 109 T.W. Alexander Drive, Research Triangle Park, NC 27711, USA.

出版信息

Talanta. 2018 May 15;182:371-379. doi: 10.1016/j.talanta.2018.01.022. Epub 2018 Jan 11.

DOI:10.1016/j.talanta.2018.01.022

PMID:29501166

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6066181/

Abstract

High-resolution mass spectrometry (HRMS) data has revolutionized the identification of environmental contaminants through non-targeted analysis (NTA). However, chemical identification remains challenging due to the vast number of unknown molecular features typically observed in environmental samples. Advanced data processing techniques are required to improve chemical identification workflows. The ideal workflow brings together a variety of data and tools to increase the certainty of identification. One such tool is chromatographic retention time (RT) prediction, which can be used to reduce the number of possible suspect chemicals within an observed RT window. This paper compares the relative predictive ability and applicability to NTA workflows of three RT prediction models: (1) a logP (octanol-water partition coefficient)-based model using EPI Suite™ logP predictions; (2) a commercially available ACD/ChromGenius model; and, (3) a newly developed Quantitative Structure Retention Relationship model called OPERA-RT. Models were developed using the same training set of 78 compounds with experimental RT data and evaluated for external predictivity on an identical test set of 19 compounds. Both the ACD/ChromGenius and OPERA-RT models outperformed the EPI Suite™ logP-based RT model (R = 0.81-0.92, 0.86-0.83, 0.66-0.69 for training-test sets, respectively). Further, both OPERA-RT and ACD/ChromGenius predicted 95% of RTs within a ± 15% chromatographic time window of experimental RTs. Based on these results, we simulated an NTA workflow with a ten-fold larger list of candidate structures generated for formulae of the known test set chemicals using the U.S. EPA's CompTox Chemistry Dashboard (https://comptox.epa.gov/dashboard), RTs for all candidates were predicted using both ACD/ChromGenius and OPERA-RT, and RT screening windows were assessed for their ability to filter out unlikely candidate chemicals and enhance potential identification. Compared to ACD/ChromGenius, OPERA-RT screened out a greater percentage of candidate structures within a 3-min RT window (60% vs. 40%) but retained fewer of the known chemicals (42% vs. 83%). By several metrics, the OPERA-RT model, generated as a proof-of-concept using a limited set of open source data, performed as well as the commercial tool ACD/ChromGenius when constrained to the same small training and test sets. As the availability of RT data increases, we expect the OPERA-RT model's predictive ability will increase.

摘要

高分辨率质谱（HRMS）数据通过非靶向分析（NTA）彻底改变了环境污染物的识别方式。然而，由于在环境样品中通常会观察到大量未知的分子特征，化学物质的识别仍然具有挑战性。需要先进的数据处理技术来改进化学物质识别工作流程。理想的工作流程整合了各种数据和工具，以提高识别的确定性。其中一种工具是色谱保留时间（RT）预测，它可用于减少在观察到的RT窗口内可能的可疑化学物质数量。本文比较了三种RT预测模型对NTA工作流程的相对预测能力和适用性：（1）基于EPI Suite™logP预测的基于logP（正辛醇-水分配系数）的模型；（2）市售的ACD/ChromGenius模型；以及（3）一种新开发的名为OPERA-RT的定量结构保留关系模型。使用包含78种化合物的相同训练集及其实验RT数据开发模型，并在包含19种化合物的相同测试集上评估其外部预测能力。ACD/ChromGenius模型和OPERA-RT模型均优于基于EPI Suite™logP的RT模型（训练集-测试集的R分别为0.81-0.92、0.86-0.83、0.66-0.69）。此外，OPERA-RT和ACD/ChromGenius均在实验RT的±15%色谱时间窗口内预测了95%的RT。基于这些结果，我们模拟了一个NTA工作流程，使用美国环境保护局的CompTox化学仪表盘（https://comptox.epa.gov/dashboard）为已知测试集化学品的分子式生成了十倍大的候选结构列表，使用ACD/ChromGenius和OPERA-RT预测了所有候选物的RT，并评估了RT筛选窗口过滤掉不太可能的候选化学物质和增强潜在识别的能力。与ACD/ChromGenius相比，OPERA-RT在3分钟的RT窗口内筛选出了更大比例的候选结构（60%对40%），但保留的已知化学物质较少（42%对83%）。通过多项指标衡量，使用有限的一组开源数据作为概念验证生成的OPERA-RT模型，在受限于相同的小训练集和测试集时，其表现与商业工具ACD/ChromGenius相当。随着RT数据可用性的增加，我们预计OPERA-RT模型的预测能力将会提高。

相似文献

A comparison of three liquid chromatography (LC) retention time prediction models.三种液相色谱（LC）保留时间预测模型的比较。

Talanta. 2018 May 15;182:371-379. doi: 10.1016/j.talanta.2018.01.022. Epub 2018 Jan 11.

Identifying known unknowns using the US EPA's CompTox Chemistry Dashboard.使用美国环境保护局的综合毒性化学仪表板识别已知的未知因素。

Anal Bioanal Chem. 2017 Mar;409(7):1729-1735. doi: 10.1007/s00216-016-0139-z. Epub 2016 Dec 16.

Prediction Models of Retention Indices for Increased Confidence in Structural Elucidation during Complex Matrix Analysis: Application to Gas Chromatography Coupled with High-Resolution Mass Spectrometry.提高复杂基质分析中结构解析置信度的保留指数预测模型：在气相色谱与高分辨率质谱联用中的应用。

Anal Chem. 2016 Aug 2;88(15):7539-47. doi: 10.1021/acs.analchem.6b00868. Epub 2016 Jul 22.

Prediction of liquid chromatographic retention for differentiation of structural isomers.预测液相色谱保留时间以区分结构异构体。

Anal Chim Acta. 2012 Mar 30;720:142-8. doi: 10.1016/j.aca.2012.01.024. Epub 2012 Jan 24.

Retention-time prediction in comprehensive two-dimensional gas chromatography to aid identification of unknown contaminants.全二维气相色谱保留时间预测辅助未知污染物鉴定。

Anal Bioanal Chem. 2018 Dec;410(30):7931-7941. doi: 10.1007/s00216-018-1415-x. Epub 2018 Oct 25.

Prediction of liquid chromatographic retention time using quantitative structure-retention relationships to assist non-targeted identification of unknown metabolites of phthalates in human urine with high-resolution mass spectrometry.使用定量结构-保留关系预测液相色谱保留时间，以协助使用高分辨率质谱法对人尿中邻苯二甲酸酯未知代谢物进行非靶向鉴定。

J Chromatogr A. 2020 Dec 20;1634:461691. doi: 10.1016/j.chroma.2020.461691. Epub 2020 Nov 10.

OPERA models for predicting physicochemical properties and environmental fate endpoints.用于预测物理化学性质和环境归宿终点的OPERA模型。

J Cheminform. 2018 Mar 8;10(1):10. doi: 10.1186/s13321-018-0263-1.

Towards a chromatographic similarity index to establish localised Quantitative Structure-Retention Relationships for retention prediction. III Combination of Tanimoto similarity index, logP, and retention factor ratio to identify optimal analyte training sets for ion chromatography.迈向用于建立局部定量结构-保留关系以进行保留预测的色谱相似性指数。III 田口相似性指数、logP 和保留因子比的组合，用于识别离子色谱的最佳分析物训练集。

J Chromatogr A. 2017 Oct 20;1520:107-116. doi: 10.1016/j.chroma.2017.09.016. Epub 2017 Sep 7.

EPA's non-targeted analysis collaborative trial (ENTACT): genesis, design, and initial findings.美国环保署的非靶向分析协作试验（ENTACT）：起源、设计和初步发现。

Anal Bioanal Chem. 2019 Feb;411(4):853-866. doi: 10.1007/s00216-018-1435-6. Epub 2018 Dec 6.

RT-Pred: A web server for accurate, customized liquid chromatography retention time prediction of chemicals.

J Chromatogr A. 2025 Apr 26;1747:465816. doi: 10.1016/j.chroma.2025.465816. Epub 2025 Feb 25.

引用本文的文献

Machine Learning for Enhanced Identification Probability in RPLC/HRMS Nontargeted Workflows.用于提高反相液相色谱/高分辨率质谱非靶向工作流程中鉴定概率的机器学习

Anal Chem. 2025 Aug 26;97(33):18028-18035. doi: 10.1021/acs.analchem.5c01873. Epub 2025 Aug 12.

Insights into predicting small molecule retention times in liquid chromatography using deep learning.利用深度学习预测液相色谱中小分子保留时间的见解

J Cheminform. 2024 Oct 7;16(1):113. doi: 10.1186/s13321-024-00905-1.

Exploring the Chemical Space of the Exposome: How Far Have We Gone?探索暴露组的化学空间：我们已经走了多远？

JACS Au. 2024 Jun 20;4(7):2412-2425. doi: 10.1021/jacsau.4c00220. eCollection 2024 Jul 22.

Spotlight on mass spectrometric non-target screening analysis: Advanced data processing methods recently communicated for extracting, prioritizing and quantifying features.质谱非靶向筛查分析聚焦：近期报道的用于提取、排序和定量特征的先进数据处理方法

Anal Sci Adv. 2022 Apr 5;3(3-4):103-112. doi: 10.1002/ansa.202200001. eCollection 2022 Apr.

Improving predictions of compound amenability for liquid chromatography-mass spectrometry to enhance non-targeted analysis.改进液相色谱-质谱联用中化合物适用性的预测以增强非靶向分析。

Anal Bioanal Chem. 2024 Apr;416(10):2565-2579. doi: 10.1007/s00216-024-05229-5. Epub 2024 Mar 26.

Metabolic profiling of UHPLC-HRMS-MS with computer-assisted structure elucidation and its antimicrobial activity.采用计算机辅助结构解析的超高效液相色谱-高分辨质谱-质谱联用技术进行代谢物谱分析及其抗菌活性研究。

Front Plant Sci. 2023 May 9;14:1138913. doi: 10.3389/fpls.2023.1138913. eCollection 2023.

Predicting RP-LC retention indices of structurally unknown chemicals from mass spectrometry data.从质谱数据预测结构未知化学品的反相液相色谱保留指数

J Cheminform. 2023 Feb 24;15(1):28. doi: 10.1186/s13321-023-00699-8.

Exposure forecasting - ExpoCast - for data-poor chemicals in commerce and the environment.暴露预测 - ExpoCast - 适用于商业和环境中数据匮乏的化学品。

J Expo Sci Environ Epidemiol. 2022 Nov;32(6):783-793. doi: 10.1038/s41370-022-00492-z. Epub 2022 Nov 8.

Strategies for structure elucidation of small molecules based on LC-MS/MS data from complex biological samples.基于复杂生物样品的液相色谱-串联质谱数据解析小分子结构的策略。

Comput Struct Biotechnol J. 2022 Sep 7;20:5085-5097. doi: 10.1016/j.csbj.2022.09.004. eCollection 2022.

Predicting reversed-phase liquid chromatographic retention times of pesticides by deep neural networks.利用深度神经网络预测农药的反相液相色谱保留时间

Heliyon. 2021 Dec 7;7(12):e08563. doi: 10.1016/j.heliyon.2021.e08563. eCollection 2021 Dec.

本文引用的文献

Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA.将非靶向分析研究和化学安全评估工具整合到美国环保局中。

J Expo Sci Environ Epidemiol. 2018 Sep;28(5):411-426. doi: 10.1038/s41370-017-0012-y. Epub 2017 Dec 29.

The CompTox Chemistry Dashboard: a community data resource for environmental chemistry.综合毒理化学仪表盘：环境化学的社区数据资源。

J Cheminform. 2017 Nov 28;9(1):61. doi: 10.1186/s13321-017-0247-6.

Suspect screening and non-targeted analysis of drinking water using point-of-use filters.使用现场使用过滤器对饮用水进行可疑物筛查和非靶向分析。

Environ Pollut. 2018 Mar;234:297-306. doi: 10.1016/j.envpol.2017.11.033. Epub 2017 Nov 26.

Identifying known unknowns using the US EPA's CompTox Chemistry Dashboard.使用美国环境保护局的综合毒性化学仪表板识别已知的未知因素。

Anal Bioanal Chem. 2017 Mar;409(7):1729-1735. doi: 10.1007/s00216-016-0139-z. Epub 2016 Dec 16.

An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling.一种用于解决QSAR建模中使用的公共数据集中化学错误和不一致性的自动化编目程序。

SAR QSAR Environ Res. 2016 Nov;27(11):939-965. doi: 10.1080/1062936X.2016.1253611.

Beware of Unreliable Q! A Comparative Study of Regression Metrics for Predictivity Assessment of QSAR Models.谨防不可靠的Q！QSAR模型预测性评估的回归指标比较研究。

J Chem Inf Model. 2016 Oct 24;56(10):1905-1913. doi: 10.1021/acs.jcim.6b00277. Epub 2016 Sep 29.

Kernel-Based, Partial Least Squares Quantitative Structure-Retention Relationship Model for UPLC Retention Time Prediction: A Useful Tool for Metabolite Identification.基于核的偏最小二乘定量结构保留关系模型用于 UPLC 保留时间预测：代谢物鉴定的有用工具。

Anal Chem. 2016 Oct 4;88(19):9510-9517. doi: 10.1021/acs.analchem.6b02075. Epub 2016 Sep 14.

Anal Chem. 2016 Aug 2;88(15):7539-47. doi: 10.1021/acs.analchem.6b00868. Epub 2016 Jul 22.

Quantitative Structure-Retention Relationship Models To Support Nontarget High-Resolution Mass Spectrometric Screening of Emerging Contaminants in Environmental Samples.支持环境样品中新兴污染物非靶向高分辨率质谱筛查的定量结构-保留关系模型

J Chem Inf Model. 2016 Jul 25;56(7):1384-98. doi: 10.1021/acs.jcim.5b00752. Epub 2016 Jun 17.

MetFrag relaunched: incorporating strategies beyond in silico fragmentation.MetFrag重新推出：纳入计算机辅助碎片化之外的策略。

J Cheminform. 2016 Jan 29;8:3. doi: 10.1186/s13321-016-0115-9. eCollection 2016.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。