Ameri Hosein, Poder Thomas G
School of Public Health, University of Montreal, Montreal, QC, Canada.
Centre de Recherche de l'IUSMM, CIUSSS de l'Est de l'Île de Montréal, Montreal, QC, Canada.
Eur J Health Econ. 2025 Jun;26(4):589-604. doi: 10.1007/s10198-024-01723-w. Epub 2024 Sep 28.
To empirically compare four preference elicitation approaches, the discrete choice experiment with time (DCE), the Best-Worst Scaling with time (BWS), DCE with BWS (DCE), and the Standard Gamble (SG) method, in valuing health states using the SF-6Dv2.
A representative sample of the general population in Quebec, Canada, completed 6 SG tasks or 13 DCE (i.e., 10 DCE followed by 3 BWS). Choice tasks were designed with the SF-6Dv2. Several models were used to estimate SG data, and the conditional logit model was used for the DCE or BWS data. The performance of SG models was assessed using prediction accuracy (mean absolute error [MAE]), goodness of fit using Bayesian information criterion (BIC), t-test, Jarque-Bera (JB) test, Ljung-Box (LB) test, the logical consistency of the parameters, and significance levels. Comparison between approaches was conducted using acceptability (self-reported difficulty and quality levels in answering, and completion time), consistency (monotonicity of model coefficients), accuracy (standard errors), dimensions coefficient magnitude, correlation between the value sets estimated, and the range of estimated values. The variance scale factor was computed to assess individuals' consistency in their choices for DCE and BWS approaches.
Out of 828 people who completed SG and 1208 for DCE tasks, a total of 724 participants for SG and 1153 for DCE tasks were included for analysis. Although no significant difference was observed in self-reported difficulties and qualities in answers among approaches, the SG had the longest completion time and excluded participants in SG were more prone to report difficulties in answering. The range of standard errors of the SG was the narrowest (0.012 to 0.015), followed by BWS (0.023 to 0.035), DCE (0.028 to 0.050), and DCE (0.028 to 0.052). The highest number of insignificant and illogical parameters was for BWS. Pain dimension was the most important across dimensions in all approaches. The correlation between SG and DCE utility values was the strongest (0.928), followed by the SG and BWS values (0.889), and the SG and DCE (0.849). The range of utility values generated by SG tended to be shorter (-0.143 to 1) than those generated by the other three methods, whereas BWS (-0.505 to 1) range values were shorter than DCE (-1.063 to 1) and DCE (-0.637 to 1). The variance scale factor suggests that respondents had almost similar level of certainty or confidence in both DCE and BWS responses.
The SG had the narrowest value set, the lowest completion rates, the longest completion time, the best prediction accuracy, and produced an unexpected sign for one level. The BWS had a narrower value set, lower completion time, higher parameter inconsistency, and higher insignificant levels compared to DCE and DCE. The results of DCE were more similar to SG in number of insignificant and illogical parameters, and correlation.
通过实证比较四种偏好诱导方法,即带时间因素的离散选择实验(DCE)、带时间因素的最佳最差标度法(BWS)、结合BWS的DCE(DCE)以及标准博弈法(SG),使用SF-6Dv2对健康状态进行估值。
从加拿大魁北克省的普通人群中选取具有代表性的样本,完成6项SG任务或13项DCE任务(即10项DCE任务后接3项BWS任务)。选择任务采用SF-6Dv2进行设计。使用多种模型估计SG数据,对DCE或BWS数据则使用条件logit模型。通过预测准确性(平均绝对误差[MAE])、使用贝叶斯信息准则(BIC)的拟合优度、t检验、Jarque-Bera(JB)检验、Ljung-Box(LB)检验、参数的逻辑一致性以及显著性水平来评估SG模型的性能。使用可接受性(自我报告的回答难度和质量水平以及完成时间)、一致性(模型系数的单调性)、准确性(标准误差)、维度系数大小、估计值集之间的相关性以及估计值范围对不同方法进行比较。计算方差比例因子以评估个体在DCE和BWS方法选择中的一致性。
在完成SG任务的828人和完成DCE任务的1208人中,共纳入724名完成SG任务的参与者和1153名完成DCE任务的参与者进行分析。尽管各方法在自我报告的回答难度和质量方面未观察到显著差异,但SG的完成时间最长,被排除在SG之外的参与者更倾向于报告回答困难。SG的标准误差范围最窄(0.012至0.015),其次是BWS(0.023至0.035)、DCE(0.028至0.050)以及DCE(0.028至0.052)。BWS的无显著意义和不合逻辑参数数量最多。在所有方法中,疼痛维度在各维度中最为重要。SG与DCE效用值之间的相关性最强(0.928),其次是SG与BWS值(0.889)以及SG与DCE(0.849)。SG产生的效用值范围(-0.143至1)往往比其他三种方法短,而BWS(-0.505至1)的范围值比DCE(-1.063至1)和DCE(-0.637至1)短。方差比例因子表明,受访者在DCE和BWS回答中的确定性或信心水平几乎相似。
SG的估值集最窄,完成率最低,完成时间最长,预测准确性最佳,且有一个水平出现了意外的符号。与DCE和DCE相比,BWS的估值集较窄,完成时间较短,参数不一致性较高,无显著意义水平较高。DCE在无显著意义和不合逻辑参数数量以及相关性方面的结果与SG更为相似。