Suppr超能文献

改进液相色谱-质谱联用中化合物适用性的预测以增强非靶向分析。

Improving predictions of compound amenability for liquid chromatography-mass spectrometry to enhance non-targeted analysis.

作者信息

Charest Nathaniel, Lowe Charles N, Ramsland Christian, Meyer Brian, Samano Vicente, Williams Antony J

机构信息

Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, 27711, USA.

Oakridge Associated Universities, Oak Ridge, Tennessee, 37830, USA.

出版信息

Anal Bioanal Chem. 2024 Apr;416(10):2565-2579. doi: 10.1007/s00216-024-05229-5. Epub 2024 Mar 26.

Abstract

Mass-spectrometry-based non-targeted analysis (NTA), in which mass spectrometric signals are assigned chemical identities based on a systematic collation of evidence, is a growing area of interest for toxicological risk assessment. Successful NTA results in better identification of potentially hazardous pollutants within the environment, facilitating the development of targeted analytical strategies to best characterize risks to human and ecological health. A supporting component of the NTA process involves assessing whether suspected chemicals are amenable to the mass spectrometric method, which is necessary in order to assign an observed signal to the chemical structure. Prior work from this group involved the development of a random forest model for predicting the amenability of 5517 unique chemical structures to liquid chromatography-mass spectrometry (LC-MS). This work improves the interpretability of the group's prior model of the same endpoint, as well as integrating 1348 more data points across negative and positive ionization modes. We enhance interpretability by feature engineering, a machine learning practice that reduces the input dimensionality while attempting to preserve performance statistics. We emphasize the importance of interpretable machine learning models within the context of building confidence in NTA identification. The novel data were curated by the labeling of compounds as amenable or unamenable by expert curators, resulting in an enhanced set of chemical compounds to expand the applicability domain of the prior model. The balanced accuracy benchmark of the newly developed model is comparable to performance previously reported (mean CV BA is 0.84 vs. 0.82 in positive mode, and 0.85 vs. 0.82 in negative mode), while on a novel external set, derived from this work's data, the Matthews correlation coefficients (MCC) for the novel models are 0.66 and 0.68 for positive and negative mode, respectively. Our group's prior published models scored MCC of 0.55 and 0.54 on the same external sets. This demonstrates appreciable improvement over the chemical space captured by the expanded dataset. This work forms part of our ongoing efforts to develop models with higher interpretability and higher performance to support NTA efforts.

摘要

基于质谱的非靶向分析(NTA)是毒理学风险评估中一个日益受到关注的领域,在该分析中,质谱信号会根据系统整理的证据被赋予化学身份。成功的NTA有助于更好地识别环境中潜在的有害污染物,推动制定有针对性的分析策略,以最佳方式描述对人类和生态健康的风险。NTA过程的一个辅助环节是评估可疑化学品是否适用于质谱方法,这对于将观察到的信号与化学结构进行关联是必要的。该团队之前的工作涉及开发一个随机森林模型,用于预测5517种独特化学结构对液相色谱 - 质谱联用(LC - MS)的适用性。这项工作提高了该团队之前相同终点模型的可解释性,同时在正负离子模式下整合了另外1348个数据点。我们通过特征工程提高可解释性,特征工程是一种机器学习实践,在试图保留性能统计数据的同时降低输入维度。我们强调在建立对NTA识别的信心的背景下,可解释机器学习模型的重要性。新数据由专家策展人将化合物标记为适用或不适用来整理,从而得到一组增强的化合物,以扩大先前模型的适用范围。新开发模型的平衡准确率基准与之前报告的性能相当(正模式下平均交叉验证平衡准确率为0.84对0.82,负模式下为0.85对0.82),而在源自这项工作数据的新外部数据集上,新模型的马修斯相关系数(MCC)在正模式和负模式下分别为0.66和0.68。该团队之前发表的模型在相同外部数据集上的MCC得分分别为0.55和0.54。这表明在扩展数据集所涵盖的化学空间方面有显著改进。这项工作是我们持续努力开发具有更高可解释性和更高性能的模型以支持NTA工作的一部分。

相似文献

本文引用的文献

6
The Tox21 10K Compound Library: Collaborative Chemistry Advancing Toxicology.Tox21 十库化合物库:协作化学推动毒理学发展。
Chem Res Toxicol. 2021 Feb 15;34(2):189-216. doi: 10.1021/acs.chemrestox.0c00264. Epub 2020 Nov 3.
9
SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0:Python 中的科学计算基础算法。
Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验