Alexander D L J, Tropsha A, Winkler David A
†CSIRO Digital Productivity Flagship, Private Bag 10, Clayton South, VIC 3169, Australia.
‡UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States.
J Chem Inf Model. 2015 Jul 27;55(7):1316-22. doi: 10.1021/acs.jcim.5b00206. Epub 2015 Jul 9.
The statistical metrics used to characterize the external predictivity of a model, i.e., how well it predicts the properties of an independent test set, have proliferated over the past decade. This paper clarifies some apparent confusion over the use of the coefficient of determination, R(2), as a measure of model fit and predictive power in QSAR and QSPR modeling. R(2) (or r(2)) has been used in various contexts in the literature in conjunction with training and test data for both ordinary linear regression and regression through the origin as well as with linear and nonlinear regression models. We analyze the widely adopted model fit criteria suggested by Golbraikh and Tropsha ( J. Mol. Graphics Modell. 2002 , 20 , 269 - 276 ) in a strict statistical manner. Shortcomings in these criteria are identified, and a clearer and simpler alternative method to characterize model predictivity is provided. The intent is not to repeat the well-documented arguments for model validation using test data but rather to guide the application of R(2) as a model fit statistic. Examples are used to illustrate both correct and incorrect uses of R(2). Reporting the root-mean-square error or equivalent measures of dispersion, which are typically of more practical importance than R(2), is also encouraged, and important challenges in addressing the needs of different categories of users such as computational chemists, experimental scientists, and regulatory decision support specialists are outlined.
在过去十年中,用于表征模型外部预测能力(即其预测独立测试集性质的能力)的统计指标大量涌现。本文澄清了在定量构效关系(QSAR)和定量构性关系(QSPR)建模中,将决定系数R²用作模型拟合度和预测能力度量时出现的一些明显混淆。R²(或r²)在文献中的各种语境下,与普通线性回归及过原点回归的训练数据和测试数据一起使用,也与线性和非线性回归模型一起使用。我们以严格的统计方式分析了Golbraikh和Tropsha(《分子图形与建模杂志》,2002年,20卷,269 - 276页)提出的被广泛采用的模型拟合标准。确定了这些标准中的缺点,并提供了一种更清晰、更简单的表征模型预测能力的替代方法。目的不是重复使用测试数据进行模型验证的那些有充分记录的论据,而是指导将R²用作模型拟合统计量的应用。通过示例来说明R²的正确和错误用法。还鼓励报告均方根误差或等效的离散度度量,它们通常比R²更具实际重要性,并概述了在满足计算化学家、实验科学家和监管决策支持专家等不同类别用户需求方面的重大挑战。