Suppr超能文献

通过精心应用经合组织的 QSAR/QSPR 原则并通过精心制作的水溶性数据集实现建模透明度。

Transparency in Modeling through Careful Application of OECD's QSAR/QSPR Principles via a Curated Water Solubility Data Set.

机构信息

Center for Computational Toxicology and Exposure, Office of Research and Development, United States Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States.

ORAU Student Services Contractor to Center for Computational Toxicology and Exposure, Office of Research and Development, United States Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States.

出版信息

Chem Res Toxicol. 2023 Mar 20;36(3):465-478. doi: 10.1021/acs.chemrestox.2c00379. Epub 2023 Mar 6.

Abstract

The need for careful assembly, training, and validation of quantitative structure-activity/property models (QSAR/QSPR) is more significant than ever as data sets become larger and sophisticated machine learning tools become increasingly ubiquitous and accessible to the scientific community. Regulatory agencies such as the United States Environmental Protection Agency must carefully scrutinize each aspect of a resulting QSAR/QSPR model to determine its potential use in environmental exposure and hazard assessment. Herein, we revisit the goals of the Organisation for Economic Cooperation and Development (OECD) in our application and discuss the validation principles for structure-activity models. We apply these principles to a model for predicting water solubility of organic compounds derived using random forest regression, a common machine learning approach in the QSA/PR literature. Using public sources, we carefully assembled and curated a data set consisting of 10,200 unique chemical structures with associated water solubility measurements. This data set was then used as a focal narrative to methodically consider the OECD's QSA/PR principles and how they can be applied to random forests. Despite some expert, mechanistically informed supervision of descriptor selection to enhance model interpretability, we achieved a model of water solubility with comparable performance to previously published models (5-fold cross validated performance 0.81 and 0.98 RMSE). We hope this work will catalyze a necessary conversation around the importance of cautiously modernizing and explicitly leveraging OECD principles while pursuing state-of-the-art machine learning approaches to derive QSA/PR models suitable for regulatory consideration.

摘要

随着数据集的增大和越来越多的机器学习工具变得普及并易于科学界使用,精心组装、培训和验证定量结构-活性/性质模型 (QSAR/QSPR) 比以往任何时候都更加重要。监管机构,如美国环境保护署,必须仔细审查 QSAR/QSPR 模型的各个方面,以确定其在环境暴露和危害评估中的潜在用途。在此,我们重新审视经济合作与发展组织 (OECD) 的目标在我们的应用程序中,并讨论结构活性模型的验证原则。我们将这些原则应用于使用随机森林回归得出的预测有机化合物水溶性的模型,这是 QSA/PR 文献中常用的机器学习方法。我们使用公共资源精心组装和策划了一个包含 10200 个独特化学结构和相关水溶性测量值的数据集。然后,该数据集被用作焦点叙述,系统地考虑 OECD 的 QSA/PR 原则以及如何将它们应用于随机森林。尽管在增强模型可解释性方面有专家的机制信息指导选择描述符,但我们还是实现了与之前发表的模型相当的水溶性模型性能(5 倍交叉验证性能为 0.81 和 0.98 RMSE)。我们希望这项工作将引发一场必要的对话,讨论在追求最先进的机器学习方法以得出适合监管考虑的 QSA/PR 模型时,谨慎地实现现代化和明确利用 OECD 原则的重要性。

相似文献

引用本文的文献

4

本文引用的文献

3
Predicting aqueous solubility by QSPR modeling.通过定量构效关系建模预测水溶解度。
J Mol Graph Model. 2021 Jul;106:107901. doi: 10.1016/j.jmgm.2021.107901. Epub 2021 Mar 22.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验