统计外部验证与共识建模：用于预测辛醇-水分配系数（Koc）的定量构效关系案例研究

Statistical external validation and consensus modeling: a QSPR case study for Koc prediction.

作者信息

Gramatica Paola, Giani Elisa, Papa Ester

机构信息

Department of Structural and Functional Biology, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, University of Insubria, via Dunant 3, 21100 Varese, Italy.

出版信息

J Mol Graph Model. 2007 Mar;25(6):755-66. doi: 10.1016/j.jmgm.2006.06.005. Epub 2006 Aug 4.

DOI:10.1016/j.jmgm.2006.06.005

PMID:16890002

Abstract

The soil sorption partition coefficient (log K(oc)) of a heterogeneous set of 643 organic non-ionic compounds, with a range of more than 6 log units, is predicted by a statistically validated QSAR modeling approach. The applied multiple linear regression (ordinary least squares, OLS) is based on a variety of theoretical molecular descriptors selected by the genetic algorithms-variable subset selection (GA-VSS) procedure. The models were validated for predictivity by different internal and external validation approaches. For external validation we applied self organizing maps (SOM) to split the original data set: the best four-dimensional model, developed on a reduced training set of 93 chemicals, has a predictivity of 78% when applied on 550 validation chemicals (prediction set). The selected molecular descriptors, which could be interpreted through their mechanistic meaning, were compared with the more common physico-chemical descriptors log K(ow) and log S(w). The chemical applicability domain of each model was verified by the leverage approach in order to propose only reliable data. The best predicted data were obtained by consensus modeling from 10 different models in the genetic algorithm model population.

摘要

采用经统计学验证的定量构效关系（QSAR）建模方法，预测了643种结构各异的有机非离子化合物的土壤吸附分配系数（log K(oc)），其范围超过6个对数单位。所应用的多元线性回归（普通最小二乘法，OLS）基于通过遗传算法-变量子集选择（GA-VSS）程序选择的各种理论分子描述符。通过不同的内部和外部验证方法对模型的预测能力进行了验证。对于外部验证，我们应用自组织映射（SOM）对原始数据集进行划分：在由93种化学物质组成的简化训练集上开发的最佳四维模型，应用于550种验证化学物质（预测集）时，预测能力为78%。将可通过其机理意义进行解释的所选分子描述符与更常见的物理化学描述符log K(ow)和log S(w)进行了比较。通过杠杆法验证了每个模型的化学适用域，以便仅提出可靠的数据。通过对遗传算法模型群体中的10个不同模型进行共识建模，获得了最佳预测数据。

相似文献

Statistical external validation and consensus modeling: a QSPR case study for Koc prediction.统计外部验证与共识建模：用于预测辛醇-水分配系数（Koc）的定量构效关系案例研究

J Mol Graph Model. 2007 Mar;25(6):755-66. doi: 10.1016/j.jmgm.2006.06.005. Epub 2006 Aug 4.

Validated QSAR prediction of OH tropospheric degradation of VOCs: splitting into training-test sets and consensus modeling.挥发性有机化合物（VOCs）在对流层中OH降解的验证性定量构效关系（QSAR）预测：划分为训练集-测试集及共识建模

J Chem Inf Comput Sci. 2004 Sep-Oct;44(5):1794-802. doi: 10.1021/ci049923u.

Statistically validated QSARs, based on theoretical descriptors, for modeling aquatic toxicity of organic chemicals in Pimephales promelas (fathead minnow).基于理论描述符的经过统计学验证的定量构效关系，用于模拟有机化学品对黑头呆鱼（肥头鲦鱼）的水生毒性。

J Chem Inf Model. 2005 Sep-Oct;45(5):1256-66. doi: 10.1021/ci050212l.

Linear QSAR regression models for the prediction of bioconcentration factors by physicochemical properties and structural theoretical molecular descriptors.通过物理化学性质和结构理论分子描述符预测生物富集因子的线性定量构效关系回归模型。

Chemosphere. 2007 Feb;67(2):351-8. doi: 10.1016/j.chemosphere.2006.09.079. Epub 2006 Nov 15.

Combinatorial QSAR modeling of chemical toxicants tested against Tetrahymena pyriformis.针对梨形四膜虫测试的化学毒物的组合定量构效关系建模。

J Chem Inf Model. 2008 Apr;48(4):766-84. doi: 10.1021/ci700443v. Epub 2008 Mar 1.

QSPR modeling of soil sorption coefficients (K(OC)) of pesticides using SPA-ANN and SPA-MLR.使用逐步回归分析-人工神经网络（SPA-ANN）和逐步回归分析-多元线性回归（SPA-MLR）对农药土壤吸附系数（K(OC)）进行定量结构-性质关系（QSPR）建模。

J Agric Food Chem. 2009 Aug 12;57(15):7153-8. doi: 10.1021/jf9008839.

Predictive QSAR modeling of HIV reverse transcriptase inhibitor TIBO derivatives.HIV逆转录酶抑制剂替博（TIBO）衍生物的预测性定量构效关系建模

Eur J Med Chem. 2009 Apr;44(4):1509-24. doi: 10.1016/j.ejmech.2008.07.020. Epub 2008 Jul 24.

Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection.针对梨形四膜虫的环境毒性定量构效关系（QSAR）模型的批判性评估：聚焦适用域及变量选择导致的过拟合问题

J Chem Inf Model. 2008 Sep;48(9):1733-46. doi: 10.1021/ci800151m. Epub 2008 Aug 26.

A quantitative structure property relationship for prediction of solubilization of hazardous compounds using GA-based MLR in CTAB micellar media.基于遗传算法的多元线性回归用于预测十六烷基三甲基溴化铵胶束介质中有害化合物增溶作用的定量结构-性质关系

J Hazard Mater. 2009 Jan 15;161(1):74-80. doi: 10.1016/j.jhazmat.2008.03.089. Epub 2008 Mar 26.

Validation of a QSAR model for acute toxicity.急性毒性定量构效关系（QSAR）模型的验证

SAR QSAR Environ Res. 2006 Apr;17(2):147-71. doi: 10.1080/10659360600636253.

引用本文的文献

Global Assessment of Emerging Contaminant Removal in Wastewater Treatment Plants: Hazard Screening and Risk Evaluation.污水处理厂中新兴污染物去除的全球评估：危害筛选与风险评估。

Toxics. 2024 Dec 25;13(1):6. doi: 10.3390/toxics13010006.

Chemical feature-based machine learning model for predicting photophysical properties of BODIPY compounds: density functional theory and quantitative structure-property relationship modeling.用于预测BODIPY化合物光物理性质的基于化学特征的机器学习模型：密度泛函理论和定量结构-性质关系建模

J Mol Model. 2024 Dec 12;31(1):18. doi: 10.1007/s00894-024-06240-4.

An Ensemble Classifiers for Improved Prediction of Native-Non-Native Protein-Protein Interaction.用于改进天然-非天然蛋白质-蛋白质相互作用预测的集成分类器。

Int J Mol Sci. 2024 May 29;25(11):5957. doi: 10.3390/ijms25115957.

Identification of potential extracellular signal-regulated protein kinase 2 inhibitors based on multiple virtual screening strategies.基于多种虚拟筛选策略鉴定潜在的细胞外信号调节蛋白激酶2抑制剂

Front Pharmacol. 2022 Nov 18;13:1077550. doi: 10.3389/fphar.2022.1077550. eCollection 2022.

Perceiving the Concealed and Unreported Pharmacophoric Features of the 5-Hydroxytryptamine Receptor Using Balanced QSAR Analysis.利用平衡定量构效关系分析识别5-羟色胺受体隐藏和未报告的药效团特征

Pharmaceuticals (Basel). 2022 Jul 5;15(7):834. doi: 10.3390/ph15070834.

Identification of potential matrix metalloproteinase-2 inhibitors from natural products through advanced machine learning-based cheminformatics approaches.通过基于先进机器学习的 cheminformatics 方法从天然产物中鉴定潜在的基质金属蛋白酶-2 抑制剂。

Mol Divers. 2023 Jun;27(3):1053-1066. doi: 10.1007/s11030-022-10467-9. Epub 2022 Jun 30.

The Relevance of Goodness-of-fit, Robustness and Prediction Validation Categories of OECD-QSAR Principles with Respect to Sample Size and Model Type.《关于样本量和模型类型，OECD-QSAR 原则中拟合优度、稳健性和预测验证类别与》

Mol Inform. 2022 Nov;41(11):e2200072. doi: 10.1002/minf.202200072. Epub 2022 Jul 25.

Mechanistic Analysis of Chemically Diverse Bromodomain-4 Inhibitors Using Balanced QSAR Analysis and Supported by X-ray Resolved Crystal Structures.使用平衡定量构效关系分析并辅以X射线解析晶体结构对化学结构多样的溴结构域-4抑制剂进行机理分析

Pharmaceuticals (Basel). 2022 Jun 14;15(6):745. doi: 10.3390/ph15060745.

Applicability Domain of Polyparameter Linear Free Energy Relationship Models Evaluated by Leverage and Prediction Interval Calculation.多参数线性自由能关系模型的适用域通过杠杆值和预测区间计算进行评估。

Environ Sci Technol. 2022 May 3;56(9):5572-5579. doi: 10.1021/acs.est.2c00865. Epub 2022 Apr 14.

Prediction of degradability of micropollutants by sonolysis in water with QSPR - a case study on phenol derivates.声化学降解水中微量污染物的 QSPR 预测 - 以苯酚衍生物为例。

Ultrason Sonochem. 2022 Jan;82:105867. doi: 10.1016/j.ultsonch.2021.105867. Epub 2021 Dec 8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

统计外部验证与共识建模：用于预测辛醇-水分配系数（Koc）的定量构效关系案例研究

Statistical external validation and consensus modeling: a QSPR case study for Koc prediction.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献