Petrou Stavros, Hockley Christine
National Perinatal Epidemiology Unit, University of Oxford (Old Road Campus), Old Road, Headington, Oxford, England.
Health Econ. 2005 Nov;14(11):1169-89. doi: 10.1002/hec.1006.
An important consideration for studies that derive utility scores using multi-attribute utility measures is the psychometric integrity of the measurement instrument. Of particular importance is the requirement to establish the empirical validity of multi-attribute utility measures; that is, whether they generate utility scores that, in practice, reflect people's preferences. We compared the empirical validity of EQ-5D versus SF-6D utility scores based on hypothetical preferences in a large, representative sample of the English population.
Adult participants in the 1996 Health Survey for England (n=16 443) formed the basis of the investigation. The subjects were asked to complete the EQ-5D and SF-36 measures. Their responses were converted into utility scores using the York A1 tariff set and the SF-6D utility algorithm, respectively. One-way analysis of variance was used to test the hypothetically constructed preference rule that each set of utility scores differs significantly by self-reported health status (categorised as very good, good, fair, bad or very bad). The degree to which EQ-5D and SF-6D utility scores reflect alternative configurations of self-reported health status; illness, disability or infirmity, and medication use was tested using the relative efficiency statistic and receiver operating characteristic (ROC) curves.
The mean utility score for the EQ-5D was 0.845 (95% CI: 0.842, 0.849), whilst the mean utility score for the SF-6D was 0.799 (95% CI: 0.797, 0.802), representing a mean difference in utility score of 0.046 (95% CI: 0.044, 0.049; p<0.001). Bland-Altman plots displayed considerable lack of agreement between the two measures, particularly at the lower end of the utility scale. Both measures demonstrated statistically significant differences between subjects who described their health status as very good, good, fair, bad or very bad (p<0.001), as well as monotonically decreasing utility scores (test for linear trend: p<0.001). The SF-6D was between 30.9 and 100.4% more efficient than the EQ-5D at detecting differences in self-reported health status, and between 10.4 and 45.6% more efficient at detecting differences in illness, disability or infirmity and medication use. The area under the curve scores generated by the ROC curves were significantly higher for the SF-6D at the 0.1% significance level when self-reported health status was dichotomised as very good versus good, fair, bad or very bad. However, the AUC scores did not reveal any significant differences in the discriminatory powers of the measures when alternative configurations of illness, disability or infirmity and medication use were examined.
This study provides evidence that the SF-6D is an empirically valid and efficient alternative multi-attribute utility measure to the EQ-5D, and is capable of discriminating between external indicators of health status. However, health economists should also consider other psychometric properties, such as practicality and reliability, when selecting either measure for evaluative purposes.
使用多属性效用测量法得出效用分数的研究,一个重要的考量因素是测量工具的心理测量完整性。尤为重要的是要确定多属性效用测量法的实证效度;也就是说,它们所产生的效用分数在实际中是否反映了人们的偏好。我们基于英国人群中一个大型代表性样本的假设偏好,比较了EQ - 5D与SF - 6D效用分数的实证效度。
1996年英国健康调查中的成年参与者(n = 16443)构成了调查的基础。受试者被要求完成EQ - 5D和SF - 36测量。他们的回答分别使用约克A1费率表和SF - 6D效用算法转换成效用分数。单向方差分析用于检验假设构建的偏好规则,即每组效用分数因自我报告的健康状况(分为非常好、好、一般、差或非常差)而有显著差异。使用相对效率统计量和受试者工作特征(ROC)曲线来检验EQ - 5D和SF - 6D效用分数反映自我报告健康状况、疾病、残疾或体弱以及药物使用的替代配置的程度。
EQ - 5D的平均效用分数为0.845(95%可信区间:0.842,0.849),而SF - 6D的平均效用分数为0.799(95%可信区间:0.797,0.802),效用分数的平均差异为0.046(95%可信区间:0.044,0.049;p < 0.001)。布兰德 - 奥特曼图显示这两种测量方法之间存在相当大的不一致,特别是在效用量表的较低端。两种测量方法在描述自己健康状况为非常好、好、一般、差或非常差的受试者之间都显示出统计学上的显著差异(p < 0.001),并且效用分数呈单调下降(线性趋势检验:p < 0.001)。在检测自我报告健康状况的差异方面,SF - 6D比EQ - 5D效率高30.9%至100.4%,在检测疾病、残疾或体弱以及药物使用的差异方面效率高10.4%至45.6%。当自我报告的健康状况分为非常好与好、一般、差或非常差时,在0.1%的显著性水平下,SF - 6D的ROC曲线生成的曲线下面积分数显著更高。然而,当检查疾病、残疾或体弱以及药物使用的替代配置时,AUC分数并未显示出这两种测量方法在区分能力上有任何显著差异。
本研究提供的证据表明,SF - 6D是一种在实证上有效且高效的多属性效用测量方法,可替代EQ - 5D,并且能够区分健康状况的外部指标。然而,健康经济学家在为评估目的选择任何一种测量方法时,也应考虑其他心理测量特性,如实用性和可靠性。