Suppr超能文献

关于聚类回归使用的警示说明

Cautionary Remarks on the Use of Clusterwise Regression.

作者信息

Brusco Michael J, Cradit J Dennis, Steinley Douglas, Fox Gavin L

机构信息

a Florida State University .

b Southern Illinois University .

出版信息

Multivariate Behav Res. 2008 Jan-Mar;43(1):29-49. doi: 10.1080/00273170701836653.

Abstract

Clusterwise linear regression is a multivariate statistical procedure that attempts to cluster objects with the objective of minimizing the sum of the error sums of squares for the within-cluster regression models. In this article, we show that the minimization of this criterion makes no effort to distinguish the error explained by the within-cluster regression models from the error explained by the clustering process. In some cases, most of the variation in the response variable is explained by clustering the objects, with little additional benefit provided by the within-cluster regression models. Accordingly, there is tremendous potential for overfitting with clusterwise regression, which is demonstrated with numerical examples and simulation experiments. To guard against the misuse of clusterwise regression, we recommend a benchmarking procedure that compares the results for the observed empirical data with those obtained across a set of random permutations of the response measures. We also demonstrate the potential for overfitting via an empirical application related to the prediction of reflective judgment using high school and college performance measures.

摘要

聚类线性回归是一种多元统计方法,它试图对对象进行聚类,目的是使聚类内回归模型的误差平方和之和最小化。在本文中,我们表明,最小化该准则并没有努力区分聚类内回归模型所解释的误差与聚类过程所解释的误差。在某些情况下,响应变量的大部分变异是通过对对象进行聚类来解释的,而聚类内回归模型几乎没有提供额外的益处。因此,聚类回归存在过度拟合的巨大潜力,这通过数值示例和模拟实验得到了证明。为了防止聚类回归的滥用,我们推荐一种基准测试程序,该程序将观察到的经验数据的结果与通过响应测量的一组随机排列获得的结果进行比较。我们还通过一个与使用高中和大学成绩测量来预测反思性判断相关的实证应用,展示了过度拟合的可能性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验