Renai Lapo, Turkina Viktoriia, Hulleman Tobias, Nikolopoulos Alexandros, Gargano Andrea F G, Amato Elvio D, Del Bubba Massimo, Samanipour Saer
Van't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, 1090 GD Amsterdam, The Netherlands.
Queensland Alliance for Environmental Health Sciences (QAEHS), 20 Cornwall Street, Woolloongabba, QLD 4102, Australia.
Environ Sci Technol Lett. 2025 Aug 18;12(9):1162-1168. doi: 10.1021/acs.estlett.5c00759. eCollection 2025 Sep 9.
The virtual chemical space of substances, including emerging contaminants relevant to the environment and exposome, is rapidly expanding. Non-targeted analysis (NTA) by liquid chromatography-high-resolution mass spectrometry (LC-HRMS) is useful in measuring broad chemical space regions. Internal standards are typically used to optimize the selectivity and sensitivity of NTA LC-HRMS methods, assuming a linear relationship between structure and behavior across all analytes. However, this assumption fails for large, heterogeneous chemical spaces, narrowing measurable coverage to structurally similar compounds. We present a data-driven strategy for unbiased sampling of candidate structures for NTA LC-HRMS method development from extensive chemical spaces, such as the U.S. EPA's CompTox (>1 million chemicals). The workflow maximizes physicochemical/structural diversity using precomputed PubChem descriptors (e.g., molecular weight, XLogP) and grants LC-HRMS compatibility thanks to predicted mobility and ionization efficiency from molecular fingerprints. The resulting measurable compound lists (MCLs) provide broad, heterogeneous coverage for NTA method development, validation, and boundary assessment. Applied to the CompTox space, the approach yielded MCLs with greater chemical coverage and broader predicted LC-HRMS applicability than conventional "watch list" contaminants, offering a robust framework for enhancing NTA's measurable chemical space while preserving diversity.
包括与环境和暴露组相关的新兴污染物在内的物质虚拟化学空间正在迅速扩大。液相色谱-高分辨率质谱法(LC-HRMS)进行的非靶向分析(NTA)有助于测量广泛的化学空间区域。通常使用内标来优化NTA LC-HRMS方法的选择性和灵敏度,前提是所有分析物的结构与行为之间存在线性关系。然而,对于庞大且异质的化学空间,这一假设并不成立,从而将可测量的范围缩小到结构相似的化合物。我们提出了一种数据驱动策略,用于从广泛的化学空间(如美国环保署的CompTox(超过100万种化学品))中无偏采样NTA LC-HRMS方法开发的候选结构。该工作流程使用预先计算的PubChem描述符(如分子量、XLogP)最大化物理化学/结构多样性,并通过分子指纹预测的迁移率和电离效率实现LC-HRMS兼容性。由此产生的可测量化合物列表(MCL)为NTA方法的开发、验证和边界评估提供了广泛的、异质的覆盖范围。应用于CompTox空间时,该方法产生的MCL比传统的“观察名单”污染物具有更大的化学覆盖范围和更广泛的预测LC-HRMS适用性,为扩大NTA可测量的化学空间同时保持多样性提供了一个强大的框架。