Hankins Matthew
King's College London, Department of Psychology (at Guy's), Institute of Psychiatry, London, UK.
BMC Med Res Methodol. 2007 May 18;7:19. doi: 10.1186/1471-2288-7-19.
Questionnaires are used routinely in clinical research to measure health status and quality of life. Questionnaire measurements are traditionally formally assessed by indices of reliability (the degree of measurement error) and validity (the extent to which the questionnaire measures what it is supposed to measure). Neither of these indices assesses the degree to which the questionnaire is able to discriminate between individuals, an important aspect of measurement. This paper introduces and extends an existing index of a questionnaire's ability to distinguish between individuals, that is, the questionnaire's discrimination.
Ferguson (1949) 1 derived an index of test discrimination, coefficient delta, for psychometric tests with dichotomous (correct/incorrect) items. In this paper a general form of the formula, deltaG, is derived for the more general class of questionnaires allowing for several response choices. The calculation and characteristics of deltaG are then demonstrated using questionnaire data (GHQ-12) from 2003-2004 British Household Panel Survey (N = 14761). Coefficients for reliability (alpha) and discrimination (deltaG) are computed for two commonly-used GHQ-12 coding methods: dichotomous coding and four-point Likert-type coding.
Both scoring methods were reliable (alpha > 0.88). However, deltaG was substantially lower (0.73) for the dichotomous coding of the GHQ-12 than for the Likert-type method (deltaG = 0.96), indicating that the dichotomous coding, although reliable, failed to discriminate between individuals.
Coefficient deltaG was shown to have decisive utility in distinguishing between the cross-sectional discrimination of two equally reliable scoring methods. Ferguson's delta has been neglected in discussions of questionnaire design and performance, perhaps because it has not been implemented in software and was restricted to questionnaires with dichotomous items, which are rare in health care research. It is suggested that the more general formula introduced here is reported as deltaG, to avoid the implication that items are dichotomously coded.
问卷调查在临床研究中常用于测量健康状况和生活质量。传统上,问卷测量通过可靠性指标(测量误差程度)和效度指标(问卷测量其预期测量内容的程度)进行正式评估。这两个指标均未评估问卷区分个体的能力,而这是测量的一个重要方面。本文介绍并扩展了一个现有的衡量问卷区分个体能力的指标,即问卷的区分度。
弗格森(1949年)1为具有二分法(正确/错误)项目的心理测量测试推导了一个测试区分度指标,即德尔塔系数。本文针对允许有多种回答选项的更一般类型的问卷推导了该公式的一般形式,即德尔塔G。然后使用2003 - 2004年英国家庭小组调查(N = 14761)的问卷数据(一般健康问卷 - 12项,GHQ - 12)展示德尔塔G的计算和特征。针对两种常用的GHQ - 12编码方法:二分法编码和四点李克特式编码,计算可靠性系数(阿尔法)和区分度系数(德尔塔G)。
两种计分方法都具有可靠性(阿尔法> 0.88)。然而,GHQ - 12的二分法编码的德尔塔G(0.73)显著低于李克特式方法(德尔塔G = 0.96),这表明二分法编码虽然可靠,但未能区分个体。
结果表明,系数德尔塔G在区分两种同样可靠的计分方法的横断面区分度方面具有决定性作用。在问卷设计和性能的讨论中,弗格森的德尔塔被忽视了,可能是因为它未在软件中实现,且仅限于具有二分法项目的问卷,而这类问卷在医疗保健研究中很少见。建议将此处引入的更一般公式报告为德尔塔G,以避免暗示项目采用二分法编码。