评估稀疏医学数据模型中的拟合优度：一种模拟方法。

Evaluating the goodness of fit in models of sparse medical data: a simulation approach.

作者信息

Boyle P, Flowerdew R, Williams A

机构信息

School of Geography, University of Leeds, UK.

出版信息

Int J Epidemiol. 1997 Jun;26(3):651-6. doi: 10.1093/ije/26.3.651.

DOI:10.1093/ije/26.3.651

PMID:9222792

Abstract

BACKGROUND

Epidemiological studies of rare events, which are common in the medical literature, often involve modeling sparse data sets. Assessing the fit of these models may be complicated by the large numbers of observed zeros in the data set.

METHODS

Poisson models, fitted as generalized linear models, were used to investigate the referral patterns of patients suffering from end-stage renal failure in south west Wales. The usual method for assessing the goodness of fit is to compare the deviance with a chi 2 distribution with appropriate degrees of freedom. However, this test may be invalid when the data set is sparse, as the deviance values may be unusually low compared to the degrees of freedom. This would suggest that there is a problem with underdispersion when, in fact, the large numbers of zeros in the data set make the comparison with the chi 2 distribution unreliable. A simulation approach is advocated as an alternative method of assessing model fit in these situations.

RESULTS

Three models are considered in detail here. The first modelled the total referrals in each of the 245 wards in the study area and included two explanatory variables. These observations were not unusually sparse and both the chi 2 goodness of fit test and the simulation methodology outlined here suggested that the model did not fit. The second model included the population 'at risk' as an offset and the model improved considerably. Both the chi 2 test and the simulation approach suggested that this model did fit. Finally, the data were disaggregated into five age groups providing 1225 observations and a very sparse data set. According to the chi 2 goodness of fit test, the deviance was very low suggesting that the model was underdispersed. Using simulated data, it was shown that the deviance was not unusually low and that the model fitted the data reasonably well.

CONCLUSION

In cases where the data set being modelled is sparse, it is useful to test the goodness of fit of a Poisson model using a simulation approach, rather than relying on the chi 2 test.

摘要

背景

罕见事件的流行病学研究在医学文献中很常见，通常涉及对稀疏数据集进行建模。评估这些模型的拟合度可能会因数据集中大量观察到的零值而变得复杂。

方法

将泊松模型作为广义线性模型进行拟合，用于研究威尔士西南部终末期肾衰竭患者的转诊模式。评估拟合优度的常用方法是将偏差与具有适当自由度的卡方分布进行比较。然而，当数据集稀疏时，此检验可能无效，因为与自由度相比，偏差值可能异常低。这表明存在过度离散的问题，而实际上数据集中的大量零值使得与卡方分布的比较不可靠。提倡使用模拟方法作为在这些情况下评估模型拟合度的替代方法。

结果

这里详细考虑了三个模型。第一个模型对研究区域内245个病房中的每一个的总转诊量进行建模，并包括两个解释变量。这些观察结果并非异常稀疏，卡方拟合优度检验和此处概述的模拟方法均表明该模型不拟合。第二个模型将“处于风险中的”人群作为偏移量，模型有了显著改进。卡方检验和模拟方法均表明该模型拟合良好。最后，数据被分解为五个年龄组，提供了1225个观察值，形成了一个非常稀疏的数据集。根据卡方拟合优度检验，偏差非常低，表明模型存在过度离散。使用模拟数据表明，偏差并非异常低，并且模型对数据的拟合相当好。