QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Via Dunant 3, 21100, Varese, Italy.
J Chem Inf Model. 2012 Aug 27;52(8):2044-58. doi: 10.1021/ci300084j. Epub 2012 Jul 13.
The evaluation of regression QSAR model performance, in fitting, robustness, and external prediction, is of pivotal importance. Over the past decade, different external validation parameters have been proposed: Q(F1)(2), Q(F2)(2), Q(F3)(2), r(m)(2), and the Golbraikh-Tropsha method. Recently, the concordance correlation coefficient (CCC, Lin), which simply verifies how small the differences are between experimental data and external data set predictions, independently of their range, was proposed by our group as an external validation parameter for use in QSAR studies. In our preliminary work, we demonstrated with thousands of simulated models that CCC is in good agreement with the compared validation criteria (except r(m)(2)) using the cutoff values normally applied for the acceptance of QSAR models as externally predictive. In this new work, we have studied and compared the general trends of the various criteria relative to different possible biases (scale and location shifts) in external data distributions, using a wide range of different simulated scenarios. This study, further supported by visual inspection of experimental vs predicted data scatter plots, has highlighted problems related to some criteria. Indeed, if based on the cutoff suggested by the proponent, r(m)(2) could also accept not predictive models in two of the possible biases (location, location plus scale), while in the case of scale shift bias, it appears to be the most restrictive. Moreover, Q(F1)(2) and Q(F2)(2) showed some problems in one of the possible biases (scale shift). This analysis allowed us to also propose recalibrated, and intercomparable for the same data scatter, new thresholds for each criterion in defining a QSAR model as really externally predictive in a more precautionary approach. An analysis of the results revealed that the scatter plot of experimental vs predicted external data must always be evaluated to support the statistical criteria values: in some cases high statistical parameter values could hide models with unacceptable predictions.
回归 QSAR 模型性能的评估,包括拟合、稳健性和外部预测,至关重要。在过去的十年中,已经提出了不同的外部验证参数:Q(F1)(2)、Q(F2)(2)、Q(F3)(2)、r(m)(2)和 Golbraikh-Tropsha 方法。最近,我们小组提出了一致性相关系数(CCC,Lin)作为一种外部验证参数,用于 QSAR 研究,它简单地验证了实验数据与外部数据集预测之间的差异有多小,而与它们的范围无关。在我们的初步工作中,我们使用数千个模拟模型证明,CCC 与比较验证标准(除 r(m)(2)外)非常一致,使用通常用于接受 QSAR 模型作为外部可预测模型的截止值。在这项新工作中,我们研究并比较了不同可能的外部数据分布偏差(尺度和位置偏移)下各种标准的总体趋势,使用了广泛的不同模拟场景。这项研究进一步通过实验数据与预测数据散点图的直观检查得到支持,突出了与一些标准相关的问题。实际上,如果基于建议者提出的截止值,r(m)(2)也可以接受两种可能的偏差(位置、位置加尺度)中的不可预测模型,而在尺度偏移偏差的情况下,它似乎是最具限制性的。此外,Q(F1)(2)和 Q(F2)(2)在一种可能的偏差(尺度偏移)中显示出一些问题。这种分析还允许我们在更谨慎的方法中为每个标准提出新的、可重新校准的、可比较的阈值,以便将 QSAR 模型定义为真正的外部可预测模型。对结果的分析表明,必须始终评估实验数据与预测外部数据的散点图,以支持统计标准值:在某些情况下,高统计参数值可能隐藏了预测不可接受的模型。