Zhao Yue
The University of Hong Kong, Pokfulam, Hong Kong SAR, China.
Qual Life Res. 2017 Mar;26(3):555-564. doi: 10.1007/s11136-016-1467-3. Epub 2016 Dec 1.
In patient-reported outcome research that utilizes item response theory (IRT), using statistical significance tests to detect misfit is usually the focus of IRT model-data fit evaluations. However, such evaluations rarely address the impact/consequence of using misfitting items on the intended clinical applications. This study was designed to evaluate the impact of IRT item misfit on score estimates and severity classifications and to demonstrate a recommended process of model-fit evaluation.
Using secondary data sources collected from the Patient-Reported Outcome Measurement Information System (PROMIS) wave 1 testing phase, analyses were conducted based on PROMIS depression (28 items; 782 cases) and pain interference (41 items; 845 cases) item banks. The identification of misfitting items was assessed using Orlando and Thissen's summed-score item-fit statistics and graphical displays. The impact of misfit was evaluated according to the agreement of both IRT-derived T-scores and severity classifications between inclusion and exclusion of misfitting items.
The examination of the presence and impact of misfit suggested that item misfit had a negligible impact on the T-score estimates and severity classifications with the general population sample in the PROMIS depression and pain interference item banks, implying that the impact of item misfit was insignificant.
Findings support the T-score estimates in the two item banks as robust against item misfit at both the group and individual levels and add confidence to the use of T-scores for severity diagnosis in the studied sample. Recommendations on approaches for identifying item misfit (statistical significance) and assessing the misfit impact (practical significance) are given.
在利用项目反应理论(IRT)的患者报告结局研究中,使用统计显著性检验来检测不拟合通常是IRT模型与数据拟合评估的重点。然而,此类评估很少涉及使用不拟合项目对预期临床应用的影响/后果。本研究旨在评估IRT项目不拟合对分数估计和严重程度分类的影响,并展示推荐的模型拟合评估过程。
使用从患者报告结局测量信息系统(PROMIS)第1波测试阶段收集的二次数据源,基于PROMIS抑郁(28个项目;782例)和疼痛干扰(41个项目;845例)项目库进行分析。使用奥兰多和西森的总分项目拟合统计量和图形显示来评估不拟合项目的识别。根据包含和排除不拟合项目时IRT导出的T分数和严重程度分类的一致性来评估不拟合的影响。
对不拟合的存在和影响的检查表明,在PROMIS抑郁和疼痛干扰项目库中的一般人群样本中,项目不拟合对T分数估计和严重程度分类的影响可忽略不计,这意味着项目不拟合的影响不显著。
研究结果支持这两个项目库中的T分数估计在组和个体水平上对项目不拟合具有稳健性,并增加了在研究样本中使用T分数进行严重程度诊断的信心。给出了关于识别项目不拟合(统计显著性)和评估不拟合影响(实际显著性)方法的建议。