MLinvitroTox 重新加载，用于基于高通量危害的高分辨率质谱数据优先级排序。

MLinvitroTox reloaded for high-throughput hazard-based prioritization of high-resolution mass spectrometry data.

作者信息

Arturi Katarzyna, Harris Eliza J, Gasser Lilian, Escher Beate I, Braun Georg, Bosshard Robin, Hollender Juliane

机构信息

Department of Environmental Chemistry, Swiss Federal Institute of Aquatic Science and Technology (Eawag), Überlandstrasse 133, 8600, Dübendorf, Switzerland.

Swiss Data Science Center (SDSC), Andreasstrasse 5, 8092, Zürich, Switzerland.

出版信息

J Cheminform. 2025 Jan 31;17(1):14. doi: 10.1186/s13321-025-00950-4.

DOI:10.1186/s13321-025-00950-4

PMID:39891244

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11786476/

Abstract

MLinvitroTox is an automated Python pipeline developed for high-throughput hazard-driven prioritization of toxicologically relevant signals detected in complex environmental samples through high-resolution tandem mass spectrometry (HRMS/MS). MLinvitroTox is a machine learning (ML) framework comprising 490 independent XGBoost classifiers trained on molecular fingerprints from chemical structures and target-specific endpoints from the ToxCast/Tox21 invitroDBv4.1 database. For each analyzed HRMS feature, MLinvitroTox generates a 490-bit bioactivity fingerprint used as a basis for prioritization, focusing the time-consuming molecular identification efforts on features most likely to cause adverse effects. The practical advantages of MLinvitroTox are demonstrated for groundwater HRMS data. Among the 874 features for which molecular fingerprints were derived from spectra, including 630 nontargets, 185 spectral matches, and 59 targets, around 4% of the feature/endpoint relationship pairs were predicted to be active. Cross-checking the predictions for targets and spectral matches with invitroDB data confirmed the bioactivity of 120 active and 6791 nonactive pairs while mislabeling 88 active and 56 non-active relationships. By filtering according to bioactivity probability, endpoint scores, and similarity to the training data, the number of potentially toxic features was reduced by at least one order of magnitude. This refinement makes the analytical confirmation of the toxicologically most relevant features feasible, offering significant benefits for cost-efficient chemical risk assessment.Scientific Contribution:In contrast to the classical ML-based approaches for toxicity prediction, MLinvitroTox predicts bioactivity for HRMS features (i.e., distinct m/z signals) based on MS2 fragmentation spectra rather than the chemical structures from the identified features. While the original proof of concept study was accompanied by the release of a MLinvitroTox v1 KNIME workflow, in this study, we release a Python MLinvitroTox v2 package, which, in addition to automation, expands functionality to include predicting toxicity from structures, cleaning up and generating chemical fingerprints, customizing models, and retraining on custom data. Furthermore, as a result of improvements in bioactivity data processing, realized in the concurrently released pytcpl Python package for the custom processing of invitroDBv4.1 input data used for training MLinvitroTox, the current release introduces enhancements in model accuracy, coverage of biological mechanistic targets, and overall interpretability.

摘要

MLinvitroTox是一个自动化的Python管道，用于通过高分辨率串联质谱（HRMS/MS）对复杂环境样品中检测到的毒理学相关信号进行高通量危害驱动的优先级排序。MLinvitroTox是一个机器学习（ML）框架，由490个独立的XGBoost分类器组成，这些分类器基于化学结构的分子指纹和ToxCast/Tox21体外数据库v4.1中的靶点特异性终点进行训练。对于每个分析的HRMS特征，MLinvitroTox生成一个490位的生物活性指纹，作为优先级排序的基础，将耗时的分子鉴定工作集中在最有可能造成不利影响的特征上。MLinvitroTox在地下水HRMS数据上的实际优势得到了证明。在从光谱中导出分子指纹的874个特征中，包括630个非靶点、185个光谱匹配和59个靶点，约4%的特征/终点关系对被预测为具有活性。将靶点和光谱匹配的预测结果与体外数据库数据进行交叉核对，确认了120个活性对和6791个非活性对的生物活性，同时错误标记了88个活性关系和56个非活性关系。通过根据生物活性概率、终点得分和与训练数据的相似性进行筛选，潜在有毒特征的数量减少了至少一个数量级。这种优化使得对毒理学上最相关特征的分析确认变得可行，为经济高效的化学风险评估带来了显著益处。科学贡献：与基于经典机器学习的毒性预测方法不同，MLinvitroTox基于MS2碎片光谱而非已识别特征的化学结构来预测HRMS特征（即不同的m/z信号）的生物活性。虽然最初的概念验证研究伴随着MLinvitroTox v1 KNIME工作流程的发布，但在本研究中，我们发布了Python MLinvitroTox v2包，除了自动化之外，还扩展了功能，包括从结构预测毒性、清理和生成化学指纹、定制模型以及在定制数据上重新训练。此外，由于在同时发布的用于训练MLinvitroTox的体外数据库v4.1输入数据的定制处理的pytcpl Python包中实现了生物活性数据处理的改进，当前版本在模型准确性、生物机制靶点覆盖范围和整体可解释性方面都有了增强。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab39/11786476/e8a743b7a4e2/13321_2025_950_Fig1_HTML.jpg

相似文献

MLinvitroTox reloaded for high-throughput hazard-based prioritization of high-resolution mass spectrometry data.MLinvitroTox 重新加载，用于基于高通量危害的高分辨率质谱数据优先级排序。

J Cheminform. 2025 Jan 31;17(1):14. doi: 10.1186/s13321-025-00950-4.

Machine Learning-Based Hazard-Driven Prioritization of Features in Nontarget Screening of Environmental High-Resolution Mass Spectrometry Data.基于机器学习的环境高分辨质谱非靶向筛查特征危险驱动优先级排序。

Environ Sci Technol. 2023 Nov 21;57(46):18067-18079. doi: 10.1021/acs.est.3c00304. Epub 2023 Jun 6.

Effect-directed analysis of genotoxicants in food packaging based on HPTLC fractionation, bioassays, and toxicity prediction with machine learning.基于高效薄层色谱分离、生物测定和机器学习毒性预测的食品包装中遗传毒性物质的效应导向分析。

Anal Bioanal Chem. 2025 Jan;417(1):131-142. doi: 10.1007/s00216-024-05632-y. Epub 2024 Nov 23.

From High Resolution Tandem Mass Spectrometry to Pollutant Toxicity AI-Based Prediction: A Case Study of 7 Endocrine Disruptors Endpoints.从高分辨率串联质谱到基于人工智能的污染物毒性预测：以7种内分泌干扰物终点为例

Environ Sci Technol. 2025 Mar 11;59(9):4505-4517. doi: 10.1021/acs.est.4c11417. Epub 2025 Mar 2.

Application of ToxCast/Tox21 data for toxicity mechanism-based evaluation and prioritization of environmental chemicals: Perspective and limitations.利用 ToxCast/Tox21 数据进行基于毒性机制的评估和优先排序环境化学物质：观点和局限性。

Toxicol In Vitro. 2022 Oct;84:105451. doi: 10.1016/j.tiv.2022.105451. Epub 2022 Jul 31.

Sulfur organic compounds in bottom sediments of the eastern Gulf of Finland.芬兰湾东部底部沉积物中的硫有机化合物。

Environ Sci Pollut Res Int. 2007 Sep;14(6):366-76. doi: 10.1065/espr2006.08.334.

Linking high resolution mass spectrometry data with exposure and toxicity forecasts to advance high-throughput environmental monitoring.将高分辨率质谱数据与暴露和毒性预测相联系，以推进高通量环境监测。

Environ Int. 2016 Mar;88:269-280. doi: 10.1016/j.envint.2015.12.008. Epub 2016 Jan 23.

ChemBioSim: Enhancing Conformal Prediction of In Vivo Toxicity by Use of Predicted Bioactivities.ChemBioSim：通过预测的生物活性增强体内毒性的一致性预测

J Chem Inf Model. 2021 Jul 26;61(7):3255-3272. doi: 10.1021/acs.jcim.1c00451. Epub 2021 Jun 21.

The ToxCast pipeline: updates to curve-fitting approaches and database structure.ToxCast 流程：曲线拟合方法及数据库结构的更新

Front Toxicol. 2023 Sep 21;5:1275980. doi: 10.3389/ftox.2023.1275980. eCollection 2023.

PFΔScreen - an open-source tool for automated PFAS feature prioritization in non-target HRMS data.PFΔScreen——一种用于非靶向高分辨率质谱数据中全氟和多氟烷基物质（PFAS）特征优先级自动排序的开源工具。

Anal Bioanal Chem. 2024 Jan;416(2):349-362. doi: 10.1007/s00216-023-05070-2. Epub 2023 Nov 30.

本文引用的文献

Reproducible mass spectrometry data processing and compound annotation in MZmine 3.在 MZmine 3 中实现可重复的质谱数据处理和化合物注释。

Nat Protoc. 2024 Sep;19(9):2597-2641. doi: 10.1038/s41596-024-00996-y. Epub 2024 May 20.

deepFPlearn +: enhancing toxicity prediction across the chemical universe using graph neural networks.深度 FP 学习+: 使用图神经网络提高化学宇宙中的毒性预测。

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad713.

A benchmark dataset for machine learning in ecotoxicology.用于生态毒理学机器学习的基准数据集。

Sci Data. 2023 Oct 18;10(1):718. doi: 10.1038/s41597-023-02612-2.

Machine Learning in Environmental Research: Common Pitfalls and Best Practices.机器学习在环境研究中的应用：常见陷阱与最佳实践。

Environ Sci Technol. 2023 Nov 21;57(46):17671-17689. doi: 10.1021/acs.est.3c00026. Epub 2023 Jun 29.

How Many Chemicals in Commerce Have Been Analyzed in Environmental Media? A 50 Year Bibliometric Analysis.商业环境介质中有多少化学物质被分析过？50 年文献计量分析。

Environ Sci Technol. 2023 Jun 27;57(25):9119-9129. doi: 10.1021/acs.est.2c09353. Epub 2023 Jun 15.

Environ Sci Technol. 2023 Nov 21;57(46):18067-18079. doi: 10.1021/acs.est.3c00304. Epub 2023 Jun 6.

Nontarget Analysis of Polluted Surface Waters in Bangladesh Using Open Science Workflows.孟加拉国受污染地表水的非目标分析：利用开放科学工作流程。

Environ Sci Technol. 2023 May 2;57(17):6808-6824. doi: 10.1021/acs.est.2c08200. Epub 2023 Apr 21.

Integrating Effect-Directed Analysis and Chemically Indicative Mass Spectral Fragmentation to Screen for Toxic Organophosphorus Compounds.整合效应导向分析与化学指示性质谱碎片分析以筛选有毒有机磷化合物。

Anal Chem. 2023 Feb 7;95(5):2623-2627. doi: 10.1021/acs.analchem.2c04842. Epub 2023 Jan 23.

Machine Learning Toxicity Prediction: Latest Advances by Toxicity End Point.机器学习毒性预测：按毒性终点划分的最新进展

ACS Omega. 2022 Dec 13;7(51):47536-47546. doi: 10.1021/acsomega.2c05693. eCollection 2022 Dec 27.

Ensemble multiclassification model for aquatic toxicity of organic compounds.有机化合物水生毒性的集成多分类模型

Aquat Toxicol. 2023 Feb;255:106379. doi: 10.1016/j.aquatox.2022.106379. Epub 2022 Dec 21.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

MLinvitroTox 重新加载，用于基于高通量危害的高分辨率质谱数据优先级排序。

MLinvitroTox reloaded for high-throughput hazard-based prioritization of high-resolution mass spectrometry data.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献