Chemistry Innovation Center, Discovery Sciences, AstraZeneca R&D Mölndal, Sweden.
J Chem Inf Model. 2013 Jun 24;53(6):1324-36. doi: 10.1021/ci4001376. Epub 2013 Jun 12.
A novel methodology was developed to build Free-Wilson like local QSAR models by combining R-group signatures and the SVM algorithm. Unlike Free-Wilson analysis this method is able to make predictions for compounds with R-groups not present in a training set. Eleven public data sets were chosen as test cases for comparing the performance of our new method with several other traditional modeling strategies, including Free-Wilson analysis. Our results show that the R-group signature SVM models achieve better prediction accuracy compared with Free-Wilson analysis in general. Moreover, the predictions of R-group signature models are also comparable to the models using ECFP6 fingerprints and signatures for the whole compound. Most importantly, R-group contributions to the SVM model can be obtained by calculating the gradient for R-group signatures. For most of the studied data sets, a significant correlation with that of a corresponding Free-Wilson analysis is shown. These results suggest that the R-group contribution can be used to interpret bioactivity data and highlight that the R-group signature based SVM modeling method is as interpretable as Free-Wilson analysis. Hence the signature SVM model can be a useful modeling tool for any drug discovery project.
开发了一种新的方法学,通过组合 R 基团特征和 SVM 算法来构建类似于 Free-Wilson 的局部 QSAR 模型。与 Free-Wilson 分析不同,该方法能够对训练集中不存在 R 基团的化合物进行预测。选择了十一个公共数据集作为测试案例,以比较我们的新方法与其他几种传统建模策略(包括 Free-Wilson 分析)的性能。我们的结果表明,与 Free-Wilson 分析相比,R 基团特征 SVM 模型通常具有更好的预测准确性。此外,R 基团特征模型的预测结果也与使用 ECFP6 指纹和整个化合物特征的模型相当。最重要的是,可以通过计算 R 基团特征的梯度来获得 SVM 模型中 R 基团的贡献。对于大多数研究的数据集中,与相应的 Free-Wilson 分析结果之间存在显著相关性。这些结果表明,R 基团贡献可用于解释生物活性数据,并强调基于 R 基团特征的 SVM 建模方法与 Free-Wilson 分析一样具有可解释性。因此,特征 SVM 模型可以成为任何药物发现项目的有用建模工具。