• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于生态毒理学机器学习的基准数据集。

A benchmark dataset for machine learning in ecotoxicology.

机构信息

Eawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland.

Swiss Data Science Center (SDSC), Zürich, Switzerland.

出版信息

Sci Data. 2023 Oct 18;10(1):718. doi: 10.1038/s41597-023-02612-2.

DOI:10.1038/s41597-023-02612-2
PMID:37853023
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10584858/
Abstract

The use of machine learning for predicting ecotoxicological outcomes is promising, but underutilized. The curation of data with informative features requires both expertise in machine learning as well as a strong biological and ecotoxicological background, which we consider a barrier of entry for this kind of research. Additionally, model performances can only be compared across studies when the same dataset, cleaning, and splittings were used. Therefore, we provide ADORE, an extensive and well-described dataset on acute aquatic toxicity in three relevant taxonomic groups (fish, crustaceans, and algae). The core dataset describes ecotoxicological experiments and is expanded with phylogenetic and species-specific data on the species as well as chemical properties and molecular representations. Apart from challenging other researchers to try and achieve the best model performances across the whole dataset, we propose specific relevant challenges on subsets of the data and include datasets and splittings corresponding to each of these challenge as well as in-depth characterization and discussion of train-test splitting approaches.

摘要

机器学习在预测生态毒理学结果方面具有广阔的应用前景,但目前的应用还不够充分。具有信息特征的数据编目既需要机器学习方面的专业知识,也需要坚实的生物学和生态毒理学背景,我们认为这是此类研究的一个进入门槛。此外,只有当使用相同的数据集、清理和拆分时,才能对不同研究中的模型性能进行比较。因此,我们提供了 ADORE,这是一个关于三个相关分类群(鱼类、甲壳类动物和藻类)急性水生毒性的广泛且描述详尽的数据集。核心数据集描述了生态毒理学实验,并扩展了物种的系统发育和物种特异性数据以及化学性质和分子表示。除了挑战其他研究人员尝试在整个数据集中实现最佳模型性能之外,我们还在数据的子集上提出了具体的相关挑战,并包含了每个挑战对应的数据集和拆分,以及对训练-测试拆分方法进行深入的特征描述和讨论。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/aaeabd05dbb6/41597_2023_2612_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/dda93c0e5a73/41597_2023_2612_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/873b8c4363d6/41597_2023_2612_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/8119c85948b3/41597_2023_2612_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/126d6bec86e3/41597_2023_2612_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/c12b834854d0/41597_2023_2612_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/81510725ea2e/41597_2023_2612_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/3edba300202c/41597_2023_2612_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/96923a69aa35/41597_2023_2612_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/aaeabd05dbb6/41597_2023_2612_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/dda93c0e5a73/41597_2023_2612_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/873b8c4363d6/41597_2023_2612_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/8119c85948b3/41597_2023_2612_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/126d6bec86e3/41597_2023_2612_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/c12b834854d0/41597_2023_2612_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/81510725ea2e/41597_2023_2612_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/3edba300202c/41597_2023_2612_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/96923a69aa35/41597_2023_2612_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf36/10584858/aaeabd05dbb6/41597_2023_2612_Fig9_HTML.jpg

相似文献

1
A benchmark dataset for machine learning in ecotoxicology.用于生态毒理学机器学习的基准数据集。
Sci Data. 2023 Oct 18;10(1):718. doi: 10.1038/s41597-023-02612-2.
2
What is the ecotoxicity of a given chemical for a given aquatic species? Predicting interactions between species and chemicals using recommender system techniques.某种化学物质对特定水生物种的生态毒性是多少?使用推荐系统技术预测物种和化学物质之间的相互作用。
SAR QSAR Environ Res. 2023 Oct-Dec;34(10):765-788. doi: 10.1080/1062936X.2023.2254225. Epub 2023 Sep 6.
3
Quantitative multi-species toxicity modeling: Does a multi-species, machine learning model provide better performance than a single-species model for the evaluation of acute aquatic toxicity by organic pollutants?定量多物种毒性建模:对于有机污染物急性水生毒性的评估,多物种机器学习模型的性能是否优于单物种模型?
Sci Total Environ. 2023 Feb 25;861:160590. doi: 10.1016/j.scitotenv.2022.160590. Epub 2022 Dec 5.
4
Bio-QSARs 2.0: Unlocking a new level of predictive power for machine learning-based ecotoxicity predictions by exploiting chemical and biological information.生物定量构效关系 2.0:通过利用化学和生物学信息,为基于机器学习的生态毒性预测解锁新的预测能力层级。
Environ Int. 2024 Apr;186:108607. doi: 10.1016/j.envint.2024.108607. Epub 2024 Apr 4.
5
Predicting the acute ecotoxicity of chemical substances by machine learning using graph theory.基于图论的机器学习预测化学物质的急性生态毒性。
Chemosphere. 2020 Jan;238:124604. doi: 10.1016/j.chemosphere.2019.124604. Epub 2019 Aug 16.
6
Support vector machine-based model for toxicity of organic compounds against fish.基于支持向量机的有机化合物对鱼类毒性模型。
Regul Toxicol Pharmacol. 2021 Jul;123:104942. doi: 10.1016/j.yrtph.2021.104942. Epub 2021 Apr 30.
7
Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型,对于使用可穿戴设备进行压力预测具有良好的泛化能力。
J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.
8
Graph neural networks-enhanced relation prediction for ecotoxicology (GRAPE).图神经网络增强的生态毒理学关系预测(GRAPE)
J Hazard Mater. 2024 Jul 5;472:134456. doi: 10.1016/j.jhazmat.2024.134456. Epub 2024 Apr 29.
9
Whole effluent assessment of industrial wastewater for determination of BAT compliance. Part 2: metal surface treatment industry.工业废水全评估以确定最佳可行技术达标情况。第 2 部分:金属表面处理行业。
Environ Sci Pollut Res Int. 2010 Jun;17(5):1149-57. doi: 10.1007/s11356-009-0290-6. Epub 2010 Feb 2.
10
Predicting chemical hazard across taxa through machine learning.通过机器学习预测跨分类群的化学危害。
Environ Int. 2022 May;163:107184. doi: 10.1016/j.envint.2022.107184. Epub 2022 Mar 17.

引用本文的文献

1
A Comparative Evaluation of Machine Learning and Deep Graph Learning for Chemical Ecotoxicological Prediction.机器学习与深度图学习用于化学生态毒理学预测的比较评估
ACS Omega. 2025 Aug 12;10(33):37549-37560. doi: 10.1021/acsomega.5c03753. eCollection 2025 Aug 26.
2
Improving the Ecotoxicological Hazard Assessment of Chemicals by Pairwise Learning.通过成对学习改进化学品的生态毒理学危害评估
Environ Sci Technol. 2025 Aug 12;59(31):16250-16260. doi: 10.1021/acs.est.5c01289. Epub 2025 Jul 31.
3
Risk assessment of industrial chemicals towards salmon species amalgamating QSAR, q-RASAR, and ARKA framework.

本文引用的文献

1
Leakage and the reproducibility crisis in machine-learning-based science.基于机器学习的科学中的漏洞与可重复性危机。
Patterns (N Y). 2023 Aug 4;4(9):100804. doi: 10.1016/j.patter.2023.100804. eCollection 2023 Sep 8.
2
Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs).机器学习在毒理学定量构效关系(QSARs)开发中的应用良好实践指南。
PLoS One. 2023 May 10;18(5):e0282924. doi: 10.1371/journal.pone.0282924. eCollection 2023.
3
Caravan - A global community dataset for large-sample hydrology.
结合定量构效关系(QSAR)、定量风险评估随机算法(q-RASAR)和ARKA框架对鲑鱼物种进行工业化学品风险评估。
Toxicol Rep. 2025 Apr 5;14:102017. doi: 10.1016/j.toxrep.2025.102017. eCollection 2025 Jun.
4
Prioritization of Unknown LC-HRMS Features Based on Predicted Toxicity Categories.基于预测毒性类别对未知液相色谱 - 高分辨质谱特征进行优先级排序。
Environ Sci Technol. 2025 Apr 29;59(16):8004-8015. doi: 10.1021/acs.est.4c13026. Epub 2025 Apr 20.
5
MLinvitroTox reloaded for high-throughput hazard-based prioritization of high-resolution mass spectrometry data.MLinvitroTox 重新加载,用于基于高通量危害的高分辨率质谱数据优先级排序。
J Cheminform. 2025 Jan 31;17(1):14. doi: 10.1186/s13321-025-00950-4.
6
Evaluation of interspecies correlation estimation models to increase taxonomic diversity while reducing reliance on animal testing for chemicals evaluated under the Toxic Substances Control Act.评估种间相关性估计模型,以增加分类学多样性,同时减少对《有毒物质控制法》评估的化学品进行动物试验的依赖。
Integr Environ Assess Manag. 2025 Jan 1;21(1):184-194. doi: 10.1093/inteam/vjae006.
7
ApisTox: a new benchmark dataset for the classification of small molecules toxicity on honey bees.蜜蜂毒素:用于小分子对蜜蜂毒性分类的新基准数据集。
Sci Data. 2025 Jan 2;12(1):5. doi: 10.1038/s41597-024-04232-w.
8
Overview of Computational Toxicology Methods Applied in Drug and Green Chemical Discovery.应用于药物和绿色化学发现的计算毒理学方法概述。
J Xenobiot. 2024 Dec 4;14(4):1901-1918. doi: 10.3390/jox14040101.
9
Advancing Maternal Transfer of Organic Pollutants across Reptiles for Conservation and Risk Assessment Purposes.推进爬行动物中有机污染物的母体转移,以达到保护和风险评估目的。
Environ Sci Technol. 2024 Oct 8;58(40):17567-17579. doi: 10.1021/acs.est.4c04668. Epub 2024 Sep 23.
10
Progress, applications, and challenges in high-throughput effect-directed analysis for toxicity driver identification - is it time for HT-EDA?用于毒性驱动因素识别的高通量效应导向分析的进展、应用及挑战——是时候采用高通量效应导向分析了吗?
Anal Bioanal Chem. 2025 Jan;417(3):451-472. doi: 10.1007/s00216-024-05424-4. Epub 2024 Jul 12.
大样本水文学的全球社区数据集——Caravan
Sci Data. 2023 Jan 31;10(1):61. doi: 10.1038/s41597-023-01975-w.
4
PubChem 2023 update.PubChem 2023 更新。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1373-D1380. doi: 10.1093/nar/gkac956.
5
TimeTree 5: An Expanded Resource for Species Divergence Times.TimeTree 5:物种分化时间的扩展资源。
Mol Biol Evol. 2022 Aug 6;39(8). doi: 10.1093/molbev/msac174.
6
Machine learning in the identification, prediction and exploration of environmental toxicology: Challenges and perspectives.机器学习在环境毒理学中的识别、预测和探索:挑战与展望。
J Hazard Mater. 2022 Sep 15;438:129487. doi: 10.1016/j.jhazmat.2022.129487. Epub 2022 Jun 27.
7
Predicting chemical hazard across taxa through machine learning.通过机器学习预测跨分类群的化学危害。
Environ Int. 2022 May;163:107184. doi: 10.1016/j.envint.2022.107184. Epub 2022 Mar 17.
8
The ECOTOXicology Knowledgebase: A Curated Database of Ecologically Relevant Toxicity Tests to Support Environmental Research and Risk Assessment.生态毒理学知识库:一个经过精心整理的具有生态相关性的毒性测试数据库,旨在支持环境研究和风险评估。
Environ Toxicol Chem. 2022 Jun;41(6):1520-1539. doi: 10.1002/etc.5324. Epub 2022 Apr 26.
9
Splitting chemical structure data sets for federated privacy-preserving machine learning.用于联邦隐私保护机器学习的化学结构数据集拆分
J Cheminform. 2021 Dec 7;13(1):96. doi: 10.1186/s13321-021-00576-2.
10
New Models to Predict the Acute and Chronic Toxicities of Representative Species of the Main Trophic Levels of Aquatic Environments.新型模型预测水生环境主要营养层次代表性物种的急性和慢性毒性。
Molecules. 2021 Nov 19;26(22):6983. doi: 10.3390/molecules26226983.