Marazzi A, Paccaud F, Ruffieux C, Beguin C
Institute of Social and Preventive Medicine, School of Medicine, University of Lausanne, Switzerland.
Med Care. 1998 Jun;36(6):915-27. doi: 10.1097/00005650-199806000-00014.
The purpose of this study was to assess the adequacy of three widely used models--Lognormal, Weibull, and Gamma--for describing the distribution of length of stay. This is a fundamental step in the development of outliers resistant (robust) methods for the statistical analysis of this kind of data, where the main objective is to determine measures of average and total resource consumption of groups of patients. Current practice uses several types of trimming rules, many of which are based on the Lognormal model, although theoretical and experimental bases are still insufficient.
The three models were adjusted using robust procedures based on M-estimators to approximately 5 million stays grouped by Diagnosis-Related Groups (DRGs). The resulting 3,279 samples were collected in five European countries during 3 years.
Most of the distributions observed could be fitted with one of these models. The descriptions provided by the Gamma and the Weibull models were similar, and the Gamma model could be omitted. The casemix description provided by the Log-normal-Weibull family was, for certain countries, significantly better than the one provided by the single Lognormal model. Often, for a given DRG and a given country, length of stay distributions could be described with the same model during several years. A given DRG, however, usually had to be described by means of different models for different countries.
Practical and conceptual consequences of the results are discussed. They can be extended to the analyses of other consumption variables used in health services. Statistical procedures for casemix description, including current rules of trimming, should be improved by means of more flexible families of models.
本研究旨在评估三种广泛使用的模型——对数正态分布模型、威布尔分布模型和伽马分布模型——对住院时间分布的拟合程度。这是开发用于此类数据统计分析的抗异常值(稳健)方法的基本步骤,此类数据统计分析的主要目标是确定患者群体的平均资源消耗和总资源消耗指标。目前的做法使用了几种类型的截断规则,其中许多基于对数正态分布模型,但其理论和实验基础仍不充分。
使用基于M估计量的稳健程序对三种模型进行调整,以拟合按诊断相关分组(DRG)分类的约500万次住院数据。在3年时间里从五个欧洲国家收集了3279个样本。
观察到的大多数分布都可以用这些模型之一进行拟合。伽马分布模型和威布尔分布模型提供的描述相似,伽马分布模型可以省略。对于某些国家,对数正态 - 威布尔族提供的病例组合描述明显优于单一对数正态分布模型。通常,对于给定的DRG和给定的国家,住院时间分布在几年内可以用相同的模型描述。然而,对于给定的DRG,不同国家通常需要用不同的模型来描述。
讨论了研究结果在实际应用和概念方面的影响。这些结果可扩展到对卫生服务中其他消耗变量的分析。病例组合描述的统计程序,包括当前的截断规则,应通过更灵活的模型族加以改进。