Ferland Troy M, Whitehead Heather D, Buckley Timothy J, Chao Alex, Minucci Jeffrey M, Carr E Tyler, Janesch Greg, Rizwan Safia, Charest Nathaniel, Williams Antony J, McCord James P, Sobus Jon R
United States Environmental Protection Agency, Office of Research and Development, Center for Computational Toxicology and Exposure, 109 TW Alexander Dr., Research Triangle Park, NC, 27711, USA.
Oak Ridge Institute for Science and Education (ORISE) Participant, Oak Ridge, TN, 37831, USA.
Anal Bioanal Chem. 2025 Jun 14. doi: 10.1007/s00216-025-05940-x.
Non-targeted analysis (NTA) methods are integral to environmental monitoring given their ability to expand measurable chemical space beyond that of traditional targeted methods. Such vast quantities of NTA data are generated that exhaustive manual review is generally unfeasible. Computational tools facilitate automated data processing, but cannot always distinguish real signals (i.e., originating from a chemical in a sample) from artifacts. Replicate analysis is recommended to aid data review, but as NTA studies become larger, the cost of analytical replication becomes untenable. A need therefore exists for examination of information penalties associated with reduced replication. To investigate this issue, using an existing NTA dataset, we performed over 70,000 simulations of variable replication designs and calculated false discovery rates (FDRs) and false negative rates (FNRs) for NTA features and occurrences. We used regression models to explore associations between replication percentage and FDR/FNR, and to test whether rates were affected by NTA feature attributes. Inverse relationships were generally observed between replication percentage and FDR/FNR, such that lower replication yielded higher information penalties. Significant increases in FDR/FNR were observed for suspected per- and polyfluoroalkyl substances (PFAS) compared to non-PFAS, highlighting the potential for differences in information penalties across feature groups. Specific quantitative information penalties are expected to be unique for each NTA study based on sample type and workflow. The methods presented here can support future pilot-scale investigations that will inform the required level of replication in full-scale studies.
非靶向分析(NTA)方法对于环境监测至关重要,因为它能够扩展可测量的化学空间,超越传统靶向方法的范围。产生的NTA数据量如此巨大,以至于详尽的人工审查通常不可行。计算工具有助于自动化数据处理,但并不总能将真实信号(即源自样品中的化学物质)与伪像区分开来。建议进行重复分析以辅助数据审查,但随着NTA研究规模的扩大,分析重复的成本变得难以承受。因此,需要研究与减少重复相关的信息损失。为了研究这个问题,我们使用现有的NTA数据集,对可变重复设计进行了70000多次模拟,并计算了NTA特征和出现情况的错误发现率(FDR)和假阴性率(FNR)。我们使用回归模型来探索重复百分比与FDR/FNR之间的关联,并测试这些比率是否受NTA特征属性的影响。通常观察到重复百分比与FDR/FNR之间呈反比关系,即较低的重复率会导致更高的信息损失。与非全氟和多氟烷基物质(PFAS)相比,疑似PFAS的FDR/FNR显著增加,突出了不同特征组之间信息损失存在差异的可能性。基于样品类型和工作流程,预计每个NTA研究的特定定量信息损失都是独特的。本文提出的方法可以支持未来的中试规模研究,这些研究将为大规模研究所需的重复水平提供参考。