Division of Epidemiology, College of Public Health, The Ohio State University, Columbus, OH 43210, U.S.A.
Stat Med. 2013 Jan 15;32(1):67-80. doi: 10.1002/sim.5525. Epub 2012 Jul 26.
The Hosmer-Lemeshow test is a commonly used procedure for assessing goodness of fit in logistic regression. It has, for example, been widely used for evaluation of risk-scoring models. As with any statistical test, the power increases with sample size; this can be undesirable for goodness of fit tests because in very large data sets, small departures from the proposed model will be considered significant. By considering the dependence of power on the number of groups used in the Hosmer-Lemeshow test, we show how the power may be standardized across different sample sizes in a wide range of models. We provide and confirm mathematical derivations through simulation and analysis of data on 31,713 children from the Collaborative Perinatal Project. We make recommendations on how to choose the number of groups in the Hosmer-Lemeshow test based on sample size and provide example applications of the recommendations.
Hosmer-Lemeshow 检验是一种常用于评估逻辑回归拟合优度的方法。例如,它已被广泛用于评估风险评分模型。与任何统计检验一样,随着样本量的增加,功效也会增加;这对于拟合优度检验来说可能是不理想的,因为在非常大的数据集,即使是很小的偏离所提出的模型也会被认为是显著的。通过考虑功效对 Hosmer-Lemeshow 检验中使用的组数的依赖性,我们展示了如何在广泛的模型中,针对不同的样本量对功效进行标准化。我们通过对来自协作围产期项目的 31713 名儿童的数据进行模拟和分析,提供并确认了数学推导。我们根据样本量提出了如何选择 Hosmer-Lemeshow 检验中组数量的建议,并提供了建议的应用示例。