评级量表估值理论基础的实验测试。

An experimental test of a theoretical foundation for rating-scale valuations.

作者信息

Bleichrodt H, Johannesson M

机构信息

Department of Health Policy and Management, Erasmus University, Rotterdam, The Netherlands.

出版信息

Med Decis Making. 1997 Apr-Jun;17(2):208-16. doi: 10.1177/0272989X9701700212.

DOI:10.1177/0272989X9701700212

PMID:9107617

Abstract

A major advantage of using a rating scale in health-utility measurement is its practical applicability: the method is relatively easy to understand, and various health states can be assessed simultaneously. However, a theoretical foundation for rating-scale valuations has not been established. The primary aim of this paper is to present a theoretical foundation for rating-scale valuations based on the theory of measurable value functions and to provide a consistency test to see whether rating-scale valuations do indeed elicit a measurable value function. If rating-scale valuations elicit a measurable value function, then Dyer and Sarin have shown how they are related to von Neumann-Morgensterm (vNM) utilities. The appropriate technique to measure vNM utilities is the standard gamble. Torrance has suggested that rating-scale valuations and standard-gamble valuations are related by a power function. A secondary aim of this paper is to examine the relationship between rating-scale valuations and standard-gamble valuations hypothesized by Torrance. An experiment was designed to test consistency of rating-scale valuations and the relationship between rating-scale valuations and standard-gamble valuations. The experiment tested whether rating-scale valuations are independent of the context in which they are elicited, as they should be if they elicit points on a measurable value function. 80 Swedish and 92 Dutch respondents participated in the experiment. The results showed that rating-scale valuations depend on the number of preferred alternatives in the task and thus violate a basic property of measurable value functions. The estimation of the power function did not result in stable results: parameter estimates varied, in some cases there was indication of misspecification, and in most cases there was indication of heteroskedastic errors. The implications of these findings for the common use of rating-scale valuations in cost-utility analysis are serious: the dependency of the rating-scale valuations on the other health states included in the task casts serious doubts on the validity of the rating-scale method.

摘要

在健康效用测量中使用评级量表的一个主要优势在于其实际适用性

该方法相对易于理解，并且可以同时评估各种健康状态。然而，评级量表估值的理论基础尚未确立。本文的主要目的是基于可测价值函数理论为评级量表估值提供一个理论基础，并提供一个一致性检验，以查看评级量表估值是否确实能引出一个可测价值函数。如果评级量表估值能引出一个可测价值函数，那么戴尔和萨林已经表明了它们与冯·诺依曼 - 摩根斯坦（vNM）效用是如何相关的。测量vNM效用的合适技术是标准博弈法。托伦斯曾提出评级量表估值和标准博弈估值通过一个幂函数相关联。本文的第二个目的是检验托伦斯所假设的评级量表估值与标准博弈估值之间的关系。设计了一个实验来测试评级量表估值的一致性以及评级量表估值与标准博弈估值之间的关系。该实验测试了评级量表估值是否独立于其引出的背景，因为如果它们能引出可测价值函数上的点，就应该是独立的。80名瑞典受访者和92名荷兰受访者参与了该实验。结果表明，评级量表估值取决于任务中偏好选项的数量，因此违反了可测价值函数的一个基本属性。幂函数的估计并没有得出稳定的结果：参数估计值各不相同，在某些情况下有模型设定错误的迹象，并且在大多数情况下有异方差误差的迹象。这些发现对评级量表估值在成本效用分析中的普遍使用所产生的影响是严重的：评级量表估值对任务中包含的其他健康状态的依赖性严重质疑了评级量表方法的有效性。