Petrillo Jennifer, Cano Stefan J, McLeod Lori D, Coon Cheryl D
Novartis AG, Basel, Switzerland.
Plymouth University Peninsula Schools of Medicine and Dentistry, Plymouth, UK.
Value Health. 2015 Jan;18(1):25-34. doi: 10.1016/j.jval.2014.10.005.
To provide comparisons and a worked example of item- and scale-level evaluations based on three psychometric methods used in patient-reported outcome development-classical test theory (CTT), item response theory (IRT), and Rasch measurement theory (RMT)-in an analysis of the National Eye Institute Visual Functioning Questionnaire (VFQ-25).
Baseline VFQ-25 data from 240 participants with diabetic macular edema from a randomized, double-masked, multicenter clinical trial were used to evaluate the VFQ at the total score level. CTT, RMT, and IRT evaluations were conducted, and results were assessed in a head-to-head comparison.
Results were similar across the three methods, with IRT and RMT providing more detailed diagnostic information on how to improve the scale. CTT led to the identification of two problematic items that threaten the validity of the overall scale score, sets of redundant items, and skewed response categories. IRT and RMT additionally identified poor fit for one item, many locally dependent items, poor targeting, and disordering of over half the response categories.
Selection of a psychometric approach depends on many factors. Researchers should justify their evaluation method and consider the intended audience. If the instrument is being developed for descriptive purposes and on a restricted budget, a cursory examination of the CTT-based psychometric properties may be all that is possible. In a high-stakes situation, such as the development of a patient-reported outcome instrument for consideration in pharmaceutical labeling, however, a thorough psychometric evaluation including IRT or RMT should be considered, with final item-level decisions made on the basis of both quantitative and qualitative results.
在对美国国立眼科研究所视觉功能问卷(VFQ - 25)的分析中,基于患者报告结局发展中使用的三种心理测量方法——经典测验理论(CTT)、项目反应理论(IRT)和拉施测量理论(RMT),提供项目层面和量表层面评估的比较及实例。
来自一项随机、双盲、多中心临床试验的240名糖尿病性黄斑水肿患者的基线VFQ - 25数据用于在总分水平上评估VFQ。进行了CTT、RMT和IRT评估,并对结果进行了直接比较。
三种方法的结果相似,IRT和RMT提供了关于如何改进量表的更详细诊断信息。CTT识别出两个有问题的项目,这些项目威胁到整体量表分数的有效性、冗余项目集以及偏态的反应类别。IRT和RMT还额外识别出一个项目拟合不佳、许多局部依赖项目、靶向性差以及超过一半的反应类别无序。
心理测量方法的选择取决于许多因素。研究人员应说明其评估方法并考虑目标受众。如果该工具是出于描述目的且预算有限而开发的,那么对基于CTT的心理测量特性进行粗略检查可能就是所能做的一切。然而,在高风险情况下,例如开发用于药品标签考虑的患者报告结局工具时,应考虑进行包括IRT或RMT在内的全面心理测量评估,并根据定量和定性结果做出最终的项目层面决策。