Hu Bo, Palta Mari, Shao Jun
Department of Statistics, University of Wisconsin-Madison, 1300 University Ave, Madison, WI 53706, USA.
Stat Med. 2006 Apr 30;25(8):1383-95. doi: 10.1002/sim.2300.
Various R(2) statistics have been proposed for logistic regression to quantify the extent to which the binary response can be predicted by a given logistic regression model and covariates. We study the asymptotic properties of three popular variance-based R(2) statistics. We find that two variance-based R(2) statistics, the sum of squares and the squared Pearson correlation, have identical asymptotic distribution whereas the third one, Gini's concentration measure, has a different asymptotic behaviour and may overstate the predictivity of the model and covariates when the model is mis-specified. Our result not only provides a theoretical basis for the findings in previous empirical and numerical work, but also leads to asymptotic confidence intervals. Statistical variability can then be taken into account when assessing the predictive value of a logistic regression model.
针对逻辑回归,已经提出了各种(R(2))统计量,以量化给定逻辑回归模型和协变量对二元响应的预测程度。我们研究了三种流行的基于方差的(R(2))统计量的渐近性质。我们发现,两种基于方差的(R(2))统计量,即平方和与皮尔逊相关系数的平方,具有相同的渐近分布,而第三种,基尼集中度测度,具有不同的渐近行为,并且当模型设定错误时,可能会高估模型和协变量的预测能力。我们的结果不仅为先前实证和数值研究中的发现提供了理论基础,还得出了渐近置信区间。在评估逻辑回归模型的预测价值时,就可以考虑统计变异性。