定量构效关系——在实际应用中效果如何？企业数据集无偏横截面描述符集的比较。

QSAR--how good is it in practice? Comparison of descriptor sets on an unbiased cross section of corporate data sets.

作者信息

Gedeck Peter, Rohde Bernhard, Bartels Christian

机构信息

Novartis Institutes for BioMedical Research, Novartis Horsham Research Centre, Wimblehurst Road, Horsham, West Sussex, RH12 5AB, UK.

出版信息

J Chem Inf Model. 2006 Sep-Oct;46(5):1924-36. doi: 10.1021/ci050413p.

DOI:10.1021/ci050413p

PMID:16995723

Abstract

The quality of QSAR (Quantitative Structure-Activity Relationships) predictions depends on a large number of factors including the descriptor set, the statistical method, and the data sets used. Here we study the quality of QSAR predictions mainly as a function of the data set and descriptor type using partial least squares as the statistical modeling method. The study makes use of the fact that we have access to a large number of data sets and to a variety of different QSAR descriptors. The main conclusions are that the quality of the predictions depends both on the data set and the descriptor used. The quality of the predictions correlates positively with the size of the data set and the range of biological activities. There is no clear dependence of the quality of the predictions on the complexity of the data set. All of the descriptors tested produced useful predictions for some of the data sets. None of the descriptors is best for all data sets; it is therefore necessary to test in each individual case, which descriptor produces the best model. In our tests, 2D fragment based descriptors usually performed better than simpler descriptors based on augmented atom types. Possible reasons for these observations are discussed.

摘要

定量构效关系（QSAR）预测的质量取决于大量因素，包括描述符集、统计方法以及所使用的数据集。在此，我们主要使用偏最小二乘法作为统计建模方法，研究QSAR预测质量作为数据集和描述符类型的函数关系。该研究利用了我们能够获取大量数据集以及各种不同QSAR描述符这一事实。主要结论是，预测质量既取决于所使用的数据集，也取决于描述符。预测质量与数据集大小和生物活性范围呈正相关。预测质量与数据集的复杂性没有明显的依赖关系。所有测试的描述符对某些数据集都产生了有用的预测。没有一个描述符对所有数据集都是最佳的；因此，有必要在每个具体案例中测试哪个描述符能产生最佳模型。在我们的测试中，基于二维片段的描述符通常比基于增强原子类型的更简单描述符表现更好。讨论了这些观察结果的可能原因。