Maunz A, Helma C
Freiburg Center for Data Analysis and Modelling, Freiburg, Germany.
SAR QSAR Environ Res. 2008;19(5-6):413-31. doi: 10.1080/10629360802358430.
We propose a new kernel, based on 2-D structural chemical similarity, that integrates activity-specific information from the training data, and a new approach to applicability domain estimation that takes feature significances and activity distributions into consideration. The new kernel provides superior results than the well-established Tanimoto kernel, and activity-sensitive feature selection enhances prediction quality. Validation of local support vector regression models based on this kernel has been preformed with three publicly available datasets from the DSSTox project. One of them (Fathead Minnow Acute Toxicity) has been already modelled by other groups, and serves as a benchmark dataset, the other two (Maximum Recommended Therapeutic Dose, IRIS Lifetime Cancer Risk) have been modelled for the first time according to the knowledge of the authors. For all three models predictive accuracies increase with the prediction confidences that indicate the applicability domain. Depending on the confidence cutoff for acceptable predictions we were able to achieve > 90% predictions within 1 log unit of the experimental data for all datasets.
我们提出了一种基于二维结构化学相似性的新内核,该内核整合了来自训练数据的活性特异性信息,以及一种考虑特征重要性和活性分布的适用性域估计新方法。新内核比成熟的Tanimoto内核提供了更优的结果,且活性敏感特征选择提高了预测质量。基于此内核的局部支持向量回归模型已使用来自DSSTox项目的三个公开可用数据集进行了验证。其中一个数据集(黑头呆鱼急性毒性)已由其他团队建模,并用作基准数据集,另外两个数据集(最大推荐治疗剂量、鸢尾花属终生癌症风险)根据作者的知识首次建模。对于所有三个模型,预测准确率随表示适用性域的预测置信度而增加。根据可接受预测的置信度阈值,我们能够在所有数据集中实验数据的1个对数单位内实现> 90%的预测。