The Ohio Colleges of Medicine Government Resource Center, Ohio State University, Columbus, Ohio.
Division of Biostatistics, College of Public Health, Ohio State University, Columbus, Ohio.
Biometrics. 2020 Jun;76(2):549-560. doi: 10.1111/biom.13249. Epub 2020 Apr 6.
Evaluating the goodness of fit of logistic regression models is crucial to ensure the accuracy of the estimated probabilities. Unfortunately, such evaluation is problematic in large samples. Because the power of traditional goodness of fit tests increases with the sample size, practically irrelevant discrepancies between estimated and true probabilities are increasingly likely to cause the rejection of the hypothesis of perfect fit in larger and larger samples. This phenomenon has been widely documented for popular goodness of fit tests, such as the Hosmer-Lemeshow test. To address this limitation, we propose a modification of the Hosmer-Lemeshow approach. By standardizing the noncentrality parameter that characterizes the alternative distribution of the Hosmer-Lemeshow statistic, we introduce a parameter that measures the goodness of fit of a model but does not depend on the sample size. We provide the methodology to estimate this parameter and construct confidence intervals for it. Finally, we propose a formal statistical test to rigorously assess whether the fit of a model, albeit not perfect, is acceptable for practical purposes. The proposed method is compared in a simulation study with a competing modification of the Hosmer-Lemeshow test, based on repeated subsampling. We provide a step-by-step illustration of our method using a model for postneonatal mortality developed in a large cohort of more than 300 000 observations.
评估逻辑回归模型的拟合优度对于确保估计概率的准确性至关重要。然而,在大样本中,这种评估存在问题。由于传统拟合优度检验的功效随着样本量的增加而增加,因此在较大和较大的样本中,估计概率与真实概率之间实际上无关紧要的差异越来越有可能导致对完全拟合假设的拒绝。这种现象已经在广受欢迎的拟合优度检验(如 Hosmer-Lemeshow 检验)中得到了广泛的证明。为了解决这个局限性,我们提出了对 Hosmer-Lemeshow 方法的修改。通过标准化表征 Hosmer-Lemeshow 统计量替代分布的非中心参数,我们引入了一个衡量模型拟合优度的参数,但它不依赖于样本量。我们提供了估计这个参数的方法,并构建了它的置信区间。最后,我们提出了一个正式的统计检验,以严格评估模型的拟合度,即使不是完美的,是否在实际应用中是可接受的。所提出的方法在基于重复抽样的 Hosmer-Lemeshow 检验的竞争修改的模拟研究中进行了比较。我们使用了一个基于 30 多万个观察值的大型队列的新生儿后期死亡率模型,逐步说明了我们的方法。