Chicco Davide, Warrens Matthijs J, Jurman Giuseppe
Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Canada.
Groningen Institute for Educational Research, University of Groningen, Groningen, Netherlands.
PeerJ Comput Sci. 2021 Jul 5;7:e623. doi: 10.7717/peerj-cs.623. eCollection 2021.
Regression analysis makes up a large part of supervised machine learning, and consists of the prediction of a continuous independent target from a set of other predictor variables. The difference between binary classification and regression is in the target range: in binary classification, the target can have only two values (usually encoded as 0 and 1), while in regression the target can have multiple values. Even if regression analysis has been employed in a huge number of machine learning studies, no consensus has been reached on a single, unified, standard metric to assess the results of the regression itself. Many studies employ the mean square error (MSE) and its rooted variant (RMSE), or the mean absolute error (MAE) and its percentage variant (MAPE). Although useful, these rates share a common drawback: since their values can range between zero and +infinity, a single value of them does not say much about the performance of the regression with respect to the distribution of the ground truth elements. In this study, we focus on two rates that actually generate a high score only if the majority of the elements of a ground truth group has been correctly predicted: the coefficient of determination (also known as -squared or ) and the symmetric mean absolute percentage error (SMAPE). After showing their mathematical properties, we report a comparison between and SMAPE in several use cases and in two real medical scenarios. Our results demonstrate that the coefficient of determination (-squared) is more informative and truthful than SMAPE, and does not have the interpretability limitations of MSE, RMSE, MAE and MAPE. We therefore suggest the usage of -squared as standard metric to evaluate regression analyses in any scientific domain.
回归分析在有监督机器学习中占据很大一部分,它由根据一组其他预测变量对连续独立目标进行预测组成。二元分类和回归之间的区别在于目标范围:在二元分类中,目标只能有两个值(通常编码为0和1),而在回归中,目标可以有多个值。即使回归分析已在大量机器学习研究中得到应用,但对于评估回归本身结果的单一、统一标准指标尚未达成共识。许多研究采用均方误差(MSE)及其开方变体(RMSE),或平均绝对误差(MAE)及其百分比变体(MAPE)。尽管这些指标很有用,但它们有一个共同的缺点:由于其值可以在零到正无穷之间变化,单个值对于回归相对于真实元素分布的性能说明不多。在本研究中,我们关注两个只有在真实组的大多数元素被正确预测时才会产生高分的指标:决定系数(也称为R平方或R²)和对称平均绝对百分比误差(SMAPE)。在展示了它们的数学性质之后,我们报告了在几个用例和两个实际医疗场景中R²和SMAPE之间的比较。我们的结果表明,决定系数(R平方)比SMAPE更具信息性和真实性,并且没有MSE、RMSE、MAE和MAPE的解释局限性。因此,我们建议使用R平方作为评估任何科学领域回归分析的标准指标。