Li Yi, Pan Dahua, Liu Jianzhong, Kern Petra S, Gerberick G Frank, Hopfinger Anton J, Tseng Yufeng J
Laboratory of Molecular Modeling and Design (MC 781), College of Pharmacy, University of Illinois at Chicago, Chicago, Illinois 60612-7231, USA.
Toxicol Sci. 2007 Oct;99(2):532-44. doi: 10.1093/toxsci/kfm185. Epub 2007 Aug 3.
Three and four state categorical quantitative structure-activity relationship (QSAR) models for skin sensitization have been constructed using data from the murine Local Lymph Node Assay studies. These are the same data we previously used to build two-state (sensitizer, nonsensitizer) QSAR models (Li et al., 2007, Chem. Res. Toxicol. 20, 114-128). 4D-fingerprint descriptors derived from the 4D-molecular similarity paradigm are used to generate these models. A training set of 196 and a test set of 22 structurally diverse compounds were used in this study. Logistic regression, and partial least square coupled logistic regression were used to build the models. The three-state QSAR model gives a classification accuracy of 73.4% for the training set and 63.6% for the test set, while the random average value of classification accuracy for any three-state data set is 33.3%. The two-2-state [four categories in total] QSAR model gives a classification accuracy of 83.2% for the training set and 54.6% for the test set, while the random average value of classification accuracy for any two-2-state data set is 25%. An analysis of the skin-sensitization models developed in this study, as well as the two-state QSAR models developed in our previous analysis, suggests that the "moderate" sensitizers may be the main source of limited model accuracy.
利用来自小鼠局部淋巴结试验研究的数据,构建了用于皮肤致敏的三态和四态分类定量构效关系(QSAR)模型。这些数据与我们之前用于构建两态(致敏剂、非致敏剂)QSAR模型的数据相同(Li等人,2007年,《化学研究毒理学》20,114 - 128)。源自4D分子相似性范式的4D指纹描述符用于生成这些模型。本研究使用了一组包含196种化合物的训练集和一组包含22种结构多样化合物的测试集。采用逻辑回归和偏最小二乘耦合逻辑回归来构建模型。三态QSAR模型对训练集的分类准确率为73.4%,对测试集的分类准确率为63.6%,而任何三态数据集分类准确率的随机平均值为33.3%。两态[总共四类]QSAR模型对训练集的分类准确率为83.2%,对测试集的分类准确率为54.6%,而任何两态数据集分类准确率的随机平均值为25%。对本研究中开发的皮肤致敏模型以及我们之前分析中开发的两态QSAR模型的分析表明,“中度”致敏剂可能是模型准确性受限的主要来源。