Zheng B, Agresti A
Wake Forest University School of Medicine, Department of Public Health Sciences, Medical Center Boulevard, Winston-Salem, NC 27157-1051, USA.
Stat Med. 2000 Jul 15;19(13):1771-81. doi: 10.1002/1097-0258(20000715)19:13<1771::aid-sim485>3.0.co;2-p.
This paper studies summary measures of the predictive power of a generalized linear model, paying special attention to a generalization of the multiple correlation coefficient from ordinary linear regression. The population value is the correlation between the response and its conditional expectation given the predictors, and the sample value is the correlation between the observed response and the model predicted value. We compare four estimators of the measure in terms of bias, mean squared error and behaviour in the presence of overparameterization. The sample estimator and a jack-knife estimator usually behave adequately, but a cross-validation estimator has a large negative bias with large mean squared error. One can use bootstrap methods to construct confidence intervals for the population value of the correlation measure and to estimate the degree to which a model selection procedure may provide an overly optimistic measure of the actual predictive power.
本文研究广义线性模型预测能力的汇总度量,特别关注普通线性回归中多重相关系数的一种推广。总体值是响应与其给定预测变量时的条件期望之间的相关性,样本值是观测到的响应与模型预测值之间的相关性。我们从偏差、均方误差以及在存在过度参数化情况下的表现这几个方面比较了该度量的四个估计量。样本估计量和一个刀切法估计量通常表现良好,但交叉验证估计量存在较大的负偏差且均方误差较大。可以使用自助法为相关性度量的总体值构建置信区间,并估计模型选择过程可能对实际预测能力提供过度乐观度量的程度。