Turkina Viktoriia, Gringhuis Jelle T, Boot Sanne, Petrignani Annemieke, Corthals Garry, Praetorius Antonia, O'Brien Jake W, Samanipour Saer
Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1090 GD, Netherlands.
Institute for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, 1090 GE, Amsterdam, Netherlands.
Environ Sci Technol. 2025 Apr 29;59(16):8004-8015. doi: 10.1021/acs.est.4c13026. Epub 2025 Apr 20.
Complex environmental samples contain a diverse array of known and unknown constituents. While liquid chromatography coupled with high-resolution mass spectrometry (LC-HRMS) nontargeted analysis (NTA) has emerged as an essential tool for the comprehensive study of such samples, the identification of individual constituents remains a significant challenge, primarily due to the vast number of detected features in each sample. To address this, prioritization strategies are frequently employed to narrow the focus to the most relevant features for further analysis. In this study, we developed a novel prioritization strategy that directly links fragmentation and chromatographic data to aquatic toxicity categories, bypassing the need for identification of individual compounds. Given that features are not always well-characterized through fragmentation, we created two models: (1) a Random Forest Classification (RFC) model, which classifies fish toxicity categories based on MS1, retention, and fragmentation data─expressed as cumulative neutral losses (CNLs)─when fragmentation information is available, and (2) a Kernel Density Estimation (KDE) model that relies solely on retention time and MS1 data when fragmentation is absent. Both models demonstrated accuracy comparable to that of structure-based prediction methods. We further tested the models on a pesticide mixture in a tea extract measured by LC-HRMS, where the CNL-based RFC model achieved 0.76 accuracy and the KDE model reached 0.61, showcasing their robust performance in real-world applications.
复杂的环境样本包含各种各样已知和未知的成分。虽然液相色谱与高分辨率质谱联用(LC-HRMS)的非靶向分析(NTA)已成为对此类样本进行全面研究的重要工具,但识别单个成分仍然是一项重大挑战,主要原因是每个样本中检测到的特征数量众多。为了解决这个问题,通常采用优先级排序策略,将重点缩小到最相关的特征以进行进一步分析。在本研究中,我们开发了一种新颖的优先级排序策略,该策略直接将碎片和色谱数据与水生毒性类别联系起来,无需识别单个化合物。鉴于通过碎片分析并非总能很好地表征特征,我们创建了两个模型:(1)随机森林分类(RFC)模型,当有碎片信息时,该模型根据MS1、保留时间和碎片数据(表示为累积中性损失(CNL))对鱼类毒性类别进行分类;(2)核密度估计(KDE)模型,当没有碎片时,该模型仅依赖保留时间和MS1数据。两个模型都显示出与基于结构的预测方法相当的准确性。我们进一步在通过LC-HRMS测量的茶提取物中的农药混合物上测试了这些模型,其中基于CNL的RFC模型的准确率达到0.76,KDE模型达到0.61,展示了它们在实际应用中的强大性能。