Kramer Andrew A, Zimmerman Jack E
Cerner Corporation, Vienna, VA, USA.
Crit Care Med. 2007 Sep;35(9):2052-6. doi: 10.1097/01.CCM.0000275267.64078.B0.
To examine the Hosmer-Lemeshow test's sensitivity in evaluating the calibration of models predicting hospital mortality in large critical care populations.
Simulation study.
Intensive care unit databases used for predictive modeling.
Data sets were simulated representing the approximate number of patients used in earlier versions of critical care predictive models (n = 5,000 and 10,000) and more recent predictive models (n = 50,000). Each patient had a hospital mortality probability generated as a function of 23 risk variables.
None.
Data sets of 5,000, 10,000, and 50,000 patients were replicated 1,000 times. Logistic regression models were evaluated for each simulated data set. This process was initially carried out under conditions of perfect fit (observed mortality = predicted mortality; standardized mortality ratio = 1.000) and repeated with an observed mortality that differed slightly (0.4%) from predicted mortality. Under conditions of perfect fit, the Hosmer-Lemeshow test was not influenced by the number of patients in the data set. In situations where there was a slight deviation from perfect fit, the Hosmer-Lemeshow test was sensitive to sample size. For populations of 5,000 patients, 10% of the Hosmer-Lemeshow tests were significant at p < .05, whereas for 10,000 patients 34% of the Hosmer-Lemeshow tests were significant at p < .05. When the number of patients matched contemporary studies (i.e., 50,000 patients), the Hosmer-Lemeshow test was statistically significant in 100% of the models.
Caution should be used in interpreting the calibration of predictive models developed using a smaller data set when applied to larger numbers of patients. A significant Hosmer-Lemeshow test does not necessarily mean that a predictive model is not useful or suspect. While decisions concerning a mortality model's suitability should include the Hosmer-Lemeshow test, additional information needs to be taken into consideration. This includes the overall number of patients, the observed and predicted probabilities within each decile, and adjunct measures of model calibration.
检验霍斯默-莱梅肖检验在评估预测大型重症监护人群医院死亡率模型的校准方面的敏感性。
模拟研究。
用于预测建模的重症监护病房数据库。
模拟数据集,代表早期重症监护预测模型(n = 5000和10000)及近期预测模型(n = 50000)中使用的患者大致数量。每位患者的医院死亡概率根据23个风险变量生成。
无。
对5000、10000和50000例患者的数据集进行1000次重复。对每个模拟数据集评估逻辑回归模型。此过程最初在完美拟合条件下(观察到的死亡率 = 预测的死亡率;标准化死亡率比 = 1.000)进行,并在观察到的死亡率与预测死亡率略有差异(0.4%)时重复进行。在完美拟合条件下,霍斯默-莱梅肖检验不受数据集中患者数量的影响。在与完美拟合略有偏差的情况下,霍斯默-莱梅肖检验对样本量敏感。对于5000例患者的群体,10%的霍斯默-莱梅肖检验在p < 0.05时具有显著性,而对于10000例患者,则有34%的霍斯默-莱梅肖检验在p < 0.05时具有显著性。当患者数量与当代研究匹配(即50000例患者)时,100%的模型中霍斯默-莱梅肖检验具有统计学显著性。
在将使用较小数据集开发的预测模型应用于更多患者时,解释其校准情况时应谨慎。霍斯默-莱梅肖检验具有显著性并不一定意味着预测模型无用或可疑。虽然关于死亡率模型适用性的决策应包括霍斯默-莱梅肖检验,但还需要考虑其他信息。这包括患者总数、每个十分位数内的观察到的和预测的概率,以及模型校准的辅助指标。