Suppr超能文献

统计外部验证与共识建模:用于预测辛醇-水分配系数(Koc)的定量构效关系案例研究

Statistical external validation and consensus modeling: a QSPR case study for Koc prediction.

作者信息

Gramatica Paola, Giani Elisa, Papa Ester

机构信息

Department of Structural and Functional Biology, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, University of Insubria, via Dunant 3, 21100 Varese, Italy.

出版信息

J Mol Graph Model. 2007 Mar;25(6):755-66. doi: 10.1016/j.jmgm.2006.06.005. Epub 2006 Aug 4.

Abstract

The soil sorption partition coefficient (log K(oc)) of a heterogeneous set of 643 organic non-ionic compounds, with a range of more than 6 log units, is predicted by a statistically validated QSAR modeling approach. The applied multiple linear regression (ordinary least squares, OLS) is based on a variety of theoretical molecular descriptors selected by the genetic algorithms-variable subset selection (GA-VSS) procedure. The models were validated for predictivity by different internal and external validation approaches. For external validation we applied self organizing maps (SOM) to split the original data set: the best four-dimensional model, developed on a reduced training set of 93 chemicals, has a predictivity of 78% when applied on 550 validation chemicals (prediction set). The selected molecular descriptors, which could be interpreted through their mechanistic meaning, were compared with the more common physico-chemical descriptors log K(ow) and log S(w). The chemical applicability domain of each model was verified by the leverage approach in order to propose only reliable data. The best predicted data were obtained by consensus modeling from 10 different models in the genetic algorithm model population.

摘要

采用经统计学验证的定量构效关系(QSAR)建模方法,预测了643种结构各异的有机非离子化合物的土壤吸附分配系数(log K(oc)),其范围超过6个对数单位。所应用的多元线性回归(普通最小二乘法,OLS)基于通过遗传算法-变量子集选择(GA-VSS)程序选择的各种理论分子描述符。通过不同的内部和外部验证方法对模型的预测能力进行了验证。对于外部验证,我们应用自组织映射(SOM)对原始数据集进行划分:在由93种化学物质组成的简化训练集上开发的最佳四维模型,应用于550种验证化学物质(预测集)时,预测能力为78%。将可通过其机理意义进行解释的所选分子描述符与更常见的物理化学描述符log K(ow)和log S(w)进行了比较。通过杠杆法验证了每个模型的化学适用域,以便仅提出可靠的数据。通过对遗传算法模型群体中的10个不同模型进行共识建模,获得了最佳预测数据。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验