Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen-German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany.
J Chem Inf Model. 2010 Dec 27;50(12):2094-111. doi: 10.1021/ci100253r. Epub 2010 Oct 29.
The estimation of accuracy and applicability of QSAR and QSPR models for biological and physicochemical properties represents a critical problem. The developed parameter of "distance to model" (DM) is defined as a metric of similarity between the training and test set compounds that have been subjected to QSAR/QSPR modeling. In our previous work, we demonstrated the utility and optimal performance of DM metrics that have been based on the standard deviation within an ensemble of QSAR models. The current study applies such analysis to 30 QSAR models for the Ames mutagenicity data set that were previously reported within the 2009 QSAR challenge. We demonstrate that the DMs based on an ensemble (consensus) model provide systematically better performance than other DMs. The presented approach identifies 30-60% of compounds having an accuracy of prediction similar to the interlaboratory accuracy of the Ames test, which is estimated to be 90%. Thus, the in silico predictions can be used to halve the cost of experimental measurements by providing a similar prediction accuracy. The developed model has been made publicly available at http://ochem.eu/models/1 .
定量构效关系(QSAR)和定量构性关系(QSPR)模型的准确性和适用性的估计是一个关键问题。所开发的“模型距离”(DM)参数被定义为已经进行了 QSAR/QSPR 建模的训练集和测试集化合物之间的相似性度量。在我们之前的工作中,我们证明了基于 QSAR 模型集合内标准差的 DM 度量的实用性和最佳性能。本研究将这种分析应用于之前在 2009 年 QSAR 挑战赛中报告的 30 个用于 Ames 致突变性数据集的 QSAR 模型。我们证明,基于集合(共识)模型的 DM 提供了比其他 DM 更系统的更好的性能。所提出的方法确定了 30-60%的化合物具有与 Ames 测试的实验室间准确性相似的预测准确性,该准确性估计为 90%。因此,通过提供相似的预测准确性,可以使用计算预测来将实验测量的成本减半。该模型已在 http://ochem.eu/models/1 上公开提供。