Netzeva Tatiana I, Gallegos Saliner Ana, Worth Andrew P
European Chemicals Bureau, Institute for Health and Consumer Protection, Joint Research Centre, European Commission, 21020 Ispra (VA), Italy.
Environ Toxicol Chem. 2006 May;25(5):1223-30. doi: 10.1897/05-367r.1.
The aim of the present study was to illustrate that it is possible and relatively straightforward to compare the domain of applicability of a quantitative structure-activity relationship (QSAR) model in terms of its physicochemical descriptors with a large inventory of chemicals. A training set of 105 chemicals with data for relative estrogenic gene activation, obtained in a recombinant yeast assay, was used to develop the QSAR. A binary classification model for predicting active versus inactive chemicals was developed using classification tree analysis and two descriptors with a clear physicochemical meaning (octanol-water partition coefficient, or log Kow, and the number of hydrogen bond donors, or n(Hdon)). The model demonstrated a high overall accuracy (90.5%), with a sensitivity of 95.9% and a specificity of 78.1%. The robustness of the model was evaluated using the leave-many-out cross-validation technique, whereas the predictivity was assessed using an artificial external test set composed of 12 compounds. The domain of the QSAR training set was compared with the chemical space covered by the European Inventory of Existing Commercial Chemical Substances (EINECS), as incorporated in the CDB-EC software, in the log Kow / n(Hdon) plane. The results showed that the training set and, therefore, the applicability domain of the QSAR model covers a small part of the physicochemical domain of the inventory, even though a simple method for defining the applicability domain (ranges in the descriptor space) was used. However, a large number of compounds are located within the narrow descriptor window.
本研究的目的是说明,就其物理化学描述符而言,将定量构效关系(QSAR)模型的适用范围与大量化学品清单进行比较是可行且相对简单的。使用在重组酵母试验中获得的105种具有相对雌激素基因激活数据的化学品训练集来开发QSAR。使用分类树分析以及两个具有明确物理化学意义的描述符(辛醇 - 水分配系数,即log Kow,以及氢键供体数量,即n(Hdon))开发了一种用于预测活性化学品与非活性化学品的二元分类模型。该模型显示出较高的总体准确率(90.5%),灵敏度为95.9%,特异性为78.1%。使用留多法交叉验证技术评估模型的稳健性,而使用由12种化合物组成的人工外部测试集评估预测能力。在log Kow / n(Hdon)平面中,将QSAR训练集的范围与CDB - EC软件中纳入的欧洲现有商业化学物质清单(EINECS)所涵盖的化学空间进行比较。结果表明,即使使用了一种定义适用范围(描述符空间中的范围)的简单方法,训练集以及因此QSAR模型的适用范围也仅覆盖清单物理化学范围的一小部分。然而,大量化合物位于狭窄的描述符窗口内。