Edelen Maria Orlando, Reeve Bryce B
Department of Psychiatry & Human Behavior, Brown Medical School, Box G-BH, Providence, RI 02912, USA.
Qual Life Res. 2007;16 Suppl 1:5-18. doi: 10.1007/s11136-007-9198-0. Epub 2007 Mar 21.
Health outcomes researchers are increasingly applying Item Response Theory (IRT) methods to questionnaire development, evaluation, and refinement efforts.
To provide a brief overview of IRT, to review some of the critical issues associated with IRT applications, and to demonstrate the basic features of IRT with an example.
Example data come from 6,504 adolescent respondents in the National Longitudinal Study of Adolescent Health public use data set who completed to the 19-item Feelings Scale for depression. The sample was split into a development and validation sample. Scale items were calibrated in the development sample with the Graded Response Model and the results were used to construct a 10-item short form. The short form was evaluated in the validation sample by examining the correspondence between IRT scores from the short form and the original, and by comparing the proportion of respondents identified as depressed according to the original and short form observed cut scores.
The 19 items varied in their discrimination (slope parameter range: .86-2.66), and item location parameters reflected a considerable range of depression (-.72-3.39). However, the item set is most discriminating at higher levels of depression. In the validation sample IRT scores generated from the short and long forms were correlated at .96 and the average difference in these scores was -.01. In addition, nearly 90% of the sample was classified identically as at risk or not at risk for depression using observed score cut points from the short and long forms.
When used appropriately, IRT can be a powerful tool for questionnaire development, evaluation, and refinement, resulting in precise, valid, and relatively brief instruments that minimize response burden.
健康结果研究人员越来越多地将项目反应理论(IRT)方法应用于问卷的开发、评估和完善工作中。
简要概述IRT,回顾与IRT应用相关的一些关键问题,并通过一个例子展示IRT的基本特征。
示例数据来自青少年健康全国纵向研究公开数据集的6504名青少年受访者,他们完成了19项抑郁情绪量表。样本被分为一个开发样本和一个验证样本。在开发样本中使用等级反应模型对量表项目进行校准,并将结果用于构建一个10项的简短形式。通过检查简短形式的IRT分数与原始分数之间的对应关系,以及比较根据原始和简短形式观察到的临界分数确定为抑郁的受访者比例,在验证样本中对简短形式进行评估。
19个项目在区分度上有所不同(斜率参数范围:0.86 - 2.66),项目位置参数反映了相当大的抑郁范围(-0.72 - 3.39)。然而,该项目集在较高抑郁水平上的区分度最高。在验证样本中,简短形式和长形式产生的IRT分数的相关性为0.96,这些分数的平均差异为-0.01。此外,使用简短形式和长形式观察到的分数临界值,近90%的样本在抑郁风险或无抑郁风险的分类上是相同的。
如果使用得当,IRT可以成为问卷开发、评估和完善的有力工具,从而产生精确、有效且相对简短的工具,将应答负担降至最低。