Montez-Rath Maria, Christiansen Cindy L, Ettner Susan L, Loveland Susan, Rosen Amy K
Boston University School of Public Health, Department of Biostatistics, Boston, Massachusetts, USA.
BMC Med Res Methodol. 2006 Oct 26;6:53. doi: 10.1186/1471-2288-6-53.
Providers use risk-adjustment systems to help manage healthcare costs. Typically, ordinary least squares (OLS) models on either untransformed or log-transformed cost are used. We examine the predictive ability of several statistical models, demonstrate how model choice depends on the goal for the predictive model, and examine whether building models on samples of the data affects model choice.
Our sample consisted of 525,620 Veterans Health Administration patients with mental health (MH) or substance abuse (SA) diagnoses who incurred costs during fiscal year 1999. We tested two models on a transformation of cost: a Log Normal model and a Square-root Normal model, and three generalized linear models on untransformed cost, defined by distributional assumption and link function: Normal with identity link (OLS); Gamma with log link; and Gamma with square-root link. Risk-adjusters included age, sex, and 12 MH/SA categories. To determine the best model among the entire dataset, predictive ability was evaluated using root mean square error (RMSE), mean absolute prediction error (MAPE), and predictive ratios of predicted to observed cost (PR) among deciles of predicted cost, by comparing point estimates and 95% bias-corrected bootstrap confidence intervals. To study the effect of analyzing a random sample of the population on model choice, we re-computed these statistics using random samples beginning with 5,000 patients and ending with the entire sample.
The Square-root Normal model had the lowest estimates of the RMSE and MAPE, with bootstrap confidence intervals that were always lower than those for the other models. The Gamma with square-root link was best as measured by the PRs. The choice of best model could vary if smaller samples were used and the Gamma with square-root link model had convergence problems with small samples.
Models with square-root transformation or link fit the data best. This function (whether used as transformation or as a link) seems to help deal with the high comorbidity of this population by introducing a form of interaction. The Gamma distribution helps with the long tail of the distribution. However, the Normal distribution is suitable if the correct transformation of the outcome is used.
医疗服务提供者使用风险调整系统来帮助管理医疗成本。通常,会使用基于未转换或对数转换成本的普通最小二乘法(OLS)模型。我们研究了几种统计模型的预测能力,展示了模型选择如何取决于预测模型的目标,并研究了基于数据样本构建模型是否会影响模型选择。
我们的样本包括525,620名在1999财年产生费用的患有精神健康(MH)或药物滥用(SA)诊断的退伍军人健康管理局患者。我们在成本转换上测试了两个模型:对数正态模型和平方根正态模型,以及在未转换成本上的三个广义线性模型,由分布假设和链接函数定义:具有恒等链接的正态分布(OLS);具有对数链接的伽马分布;以及具有平方根链接的伽马分布。风险调整因素包括年龄、性别和12个MH/SA类别。为了在整个数据集内确定最佳模型,通过比较点估计值和95%偏差校正的自助置信区间,使用均方根误差(RMSE)、平均绝对预测误差(MAPE)以及预测成本十分位数中预测成本与观察成本的预测比率(PR)来评估预测能力。为了研究分析总体的随机样本对模型选择的影响,我们从5000名患者开始,以整个样本结束,使用随机样本重新计算这些统计量。
平方根正态模型的RMSE和MAPE估计值最低,其自助置信区间始终低于其他模型。从PR衡量,具有平方根链接的伽马分布模型最佳。如果使用较小样本,最佳模型的选择可能会有所不同,并且具有平方根链接的伽马分布模型在小样本时存在收敛问题。
具有平方根转换或链接的模型最适合数据。该函数(无论是用作转换还是链接)似乎通过引入一种交互形式来帮助处理该人群的高共病性。伽马分布有助于处理分布的长尾。然而,如果对结果进行了正确的转换,正态分布也是合适的。