用于预测物理化学性质和环境归宿终点的OPERA模型。

OPERA models for predicting physicochemical properties and environmental fate endpoints.

作者信息

Mansouri Kamel, Grulke Chris M, Judson Richard S, Williams Antony J

机构信息

National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA.

Oak Ridge Institute for Science and Education, 1299 Bethel Valley Road, Oak Ridge, TN, 37830, USA.

出版信息

J Cheminform. 2018 Mar 8;10(1):10. doi: 10.1186/s13321-018-0263-1.

DOI:10.1186/s13321-018-0263-1

PMID:29520515

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5843579/

Abstract

The collection of chemical structure information and associated experimental data for quantitative structure-activity/property relationship (QSAR/QSPR) modeling is facilitated by an increasing number of public databases containing large amounts of useful data. However, the performance of QSAR models highly depends on the quality of the data and modeling methodology used. This study aims to develop robust QSAR/QSPR models for chemical properties of environmental interest that can be used for regulatory purposes. This study primarily uses data from the publicly available PHYSPROP database consisting of a set of 13 common physicochemical and environmental fate properties. These datasets have undergone extensive curation using an automated workflow to select only high-quality data, and the chemical structures were standardized prior to calculation of the molecular descriptors. The modeling procedure was developed based on the five Organization for Economic Cooperation and Development (OECD) principles for QSAR models. A weighted k-nearest neighbor approach was adopted using a minimum number of required descriptors calculated using PaDEL, an open-source software. The genetic algorithms selected only the most pertinent and mechanistically interpretable descriptors (2-15, with an average of 11 descriptors). The sizes of the modeled datasets varied from 150 chemicals for biodegradability half-life to 14,050 chemicals for logP, with an average of 3222 chemicals across all endpoints. The optimal models were built on randomly selected training sets (75%) and validated using fivefold cross-validation (CV) and test sets (25%). The CV Q of the models varied from 0.72 to 0.95, with an average of 0.86 and an R test value from 0.71 to 0.96, with an average of 0.82. Modeling and performance details are described in QSAR model reporting format and were validated by the European Commission's Joint Research Center to be OECD compliant. All models are freely available as an open-source, command-line application called OPEn structure-activity/property Relationship App (OPERA). OPERA models were applied to more than 750,000 chemicals to produce freely available predicted data on the U.S. Environmental Protection Agency's CompTox Chemistry Dashboard.

摘要

越来越多包含大量有用数据的公共数据库，为定量构效/构性关系（QSAR/QSPR）建模的化学结构信息及相关实验数据收集提供了便利。然而，QSAR模型的性能高度依赖于所使用的数据质量和建模方法。本研究旨在开发用于环境相关化学性质的稳健QSAR/QSPR模型，以供监管使用。本研究主要使用来自公开可用的PHYSPROP数据库的数据，该数据库包含一组13种常见的物理化学和环境归宿性质。这些数据集使用自动化工作流程进行了广泛整理，以仅选择高质量数据，并且在计算分子描述符之前对化学结构进行了标准化。建模程序是根据经济合作与发展组织（OECD）的五项QSAR模型原则开发的。采用加权k近邻方法，使用开源软件PaDEL计算所需的最少描述符数量。遗传算法仅选择最相关且具有机理解释性的描述符（2 - 15个，平均11个描述符）。建模数据集的大小从用于生物降解半衰期的150种化学品到用于logP的14,050种化学品不等，所有端点的平均数量为3222种化学品。最优模型基于随机选择的训练集（75%）构建，并使用五重交叉验证（CV）和测试集（25%）进行验证。模型的CV Q值从0.72到0.95不等，平均为0.86，R测试值从0.71到0.96不等，平均为0.82。建模和性能细节以QSAR模型报告格式描述，并经欧盟委员会联合研究中心验证符合OECD标准。所有模型均可作为名为开放结构活性/性质关系应用程序（OPERA）的开源命令行应用程序免费获取。OPERA模型应用于超过750,000种化学品，以在美国环境保护局的综合毒性化学仪表板上生成免费的预测数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9731/5843579/47a88638ce7c/13321_2018_263_Fig1_HTML.jpg

相似文献

OPERA models for predicting physicochemical properties and environmental fate endpoints.

J Cheminform. 2018 Mar 8;10(1):10. doi: 10.1186/s13321-018-0263-1.

Quantitative structure-property relationships for the calculation of the soil adsorption coefficient using machine learning algorithms with calculated chemical properties from open-source software.

Environ Res. 2021 May;196:110363. doi: 10.1016/j.envres.2020.110363. Epub 2020 Oct 22.

QSARINS-chem: Insubria datasets and new QSAR/QSPR models for environmental pollutants in QSARINS.

J Comput Chem. 2014 May 15;35(13):1036-44. doi: 10.1002/jcc.23576. Epub 2014 Mar 5.

An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling.

SAR QSAR Environ Res. 2016 Nov;27(11):939-965. doi: 10.1080/1062936X.2016.1253611.

A comparison of three liquid chromatography (LC) retention time prediction models.

Talanta. 2018 May 15;182:371-379. doi: 10.1016/j.talanta.2018.01.022. Epub 2018 Jan 11.

The CompTox Chemistry Dashboard: a community data resource for environmental chemistry.

J Cheminform. 2017 Nov 28;9(1):61. doi: 10.1186/s13321-017-0247-6.

Influence of chemical structure of organic micropollutants on the degradability with ozonation.

Water Res. 2022 Aug 15;222:118866. doi: 10.1016/j.watres.2022.118866. Epub 2022 Jul 15.

In Silico Study of In Vitro GPCR Assays by QSAR Modeling.

Methods Mol Biol. 2016;1425:361-81. doi: 10.1007/978-1-4939-3609-0_16.

Property Estimation of Per- and Polyfluoroalkyl Substances: A Comparative Assessment of Estimation Methods.

Environ Toxicol Chem. 2020 Apr;39(4):775-786. doi: 10.1002/etc.4681. Epub 2020 Mar 5.

Exploring the QSAR's predictive truthfulness of the novel N-tuple discrete derivative indices on benchmark datasets.

SAR QSAR Environ Res. 2017 May;28(5):367-389. doi: 10.1080/1062936X.2017.1326403.

引用本文的文献

Genome-Scale Metabolic Modeling Predicts Per- and Polyfluoroalkyl Substance-Mediated Early Perturbations in Liver Metabolism.

Toxics. 2025 Aug 17;13(8):684. doi: 10.3390/toxics13080684.

prediction of p values using explainable deep learning methods.

J Pharm Anal. 2025 Jun;15(6):101174. doi: 10.1016/j.jpha.2024.101174. Epub 2024 Dec 28.

Mapping the chemical complexity of plastics.

Nature. 2025 Jul;643(8071):349-355. doi: 10.1038/s41586-025-09184-8. Epub 2025 Jul 9.

Molecular property prediction in the ultra-low data regime.

Commun Chem. 2025 Jul 8;8(1):201. doi: 10.1038/s42004-025-01592-1.

Comprehensive reexamination of the acute toxicity of nitrogen mustards: HN-1, HN-2 and HN-3 as blister agents: application of multi in silico approach.

Arch Toxicol. 2025 Jun 25. doi: 10.1007/s00204-025-04105-0.

Development of chemical categories for per- and polyfluoroalkyl substances (PFAS) and the proof-of-concept approach to the identification of potential candidates for tiered toxicological testing and human health assessment.

Comput Toxicol. 2024 Sep 1;31:100327. doi: 10.1016/j.comtox.2024.100327.

Machine Learning for Toxicity Prediction Using Chemical Structures: Pillars for Success in the Real World.

Chem Res Toxicol. 2025 May 19;38(5):759-807. doi: 10.1021/acs.chemrestox.5c00033. Epub 2025 May 2.

Enabling transparent toxicokinetic modeling for public health risk assessment.

PLoS One. 2025 Apr 16;20(4):e0321321. doi: 10.1371/journal.pone.0321321. eCollection 2025.

Application of Machine Learning and Mechanistic Modeling to Predict Intravenous Pharmacokinetic Profiles in Humans.

J Med Chem. 2025 Apr 10;68(7):7737-7750. doi: 10.1021/acs.jmedchem.5c00340. Epub 2025 Mar 27.

A mixture parameterized biologically based dosimetry model to predict body burdens of polycyclic aromatic hydrocarbons in developmental zebrafish toxicity assays.

Toxicol Sci. 2025 Jun 1;205(2):326-343. doi: 10.1093/toxsci/kfaf039.

本文引用的文献

Rapid experimental measurements of physicochemical properties to inform models and testing.

Sci Total Environ. 2018 Sep 15;636:901-909. doi: 10.1016/j.scitotenv.2018.04.266. Epub 2018 May 2.

A comparison of three liquid chromatography (LC) retention time prediction models.

Talanta. 2018 May 15;182:371-379. doi: 10.1016/j.talanta.2018.01.022. Epub 2018 Jan 11.

Evaluating opportunities for advancing the use of alternative methods in risk assessment through the development of fit-for-purpose in vitro assays.

Toxicol In Vitro. 2018 Apr;48:310-317. doi: 10.1016/j.tiv.2018.01.027. Epub 2018 Jan 31.

Demonstration of a consensus approach for the calculation of physicochemical properties required for environmental fate assessments.

Chemosphere. 2018 Mar;194:94-106. doi: 10.1016/j.chemosphere.2017.11.137. Epub 2017 Nov 23.

The CompTox Chemistry Dashboard: a community data resource for environmental chemistry.

J Cheminform. 2017 Nov 28;9(1):61. doi: 10.1186/s13321-017-0247-6.

Suspect screening and non-targeted analysis of drinking water using point-of-use filters.

Environ Pollut. 2018 Mar;234:297-306. doi: 10.1016/j.envpol.2017.11.033. Epub 2017 Nov 26.

Exploring consumer exposure pathways and patterns of use for chemicals in the environment.

Toxicol Rep. 2015 Jan 2;2:228-237. doi: 10.1016/j.toxrep.2014.12.009. eCollection 2015.

In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning.

J Chem Inf Model. 2017 Jan 23;57(1):36-49. doi: 10.1021/acs.jcim.6b00625. Epub 2017 Jan 9.

Identifying known unknowns using the US EPA's CompTox Chemistry Dashboard.

Anal Bioanal Chem. 2017 Mar;409(7):1729-1735. doi: 10.1007/s00216-016-0139-z. Epub 2016 Dec 16.

An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling.

SAR QSAR Environ Res. 2016 Nov;27(11):939-965. doi: 10.1080/1062936X.2016.1253611.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于预测物理化学性质和环境归宿终点的OPERA模型。

OPERA models for predicting physicochemical properties and environmental fate endpoints.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献