心理测量方法的现状：对国际生活质量研究学会（ISOQOL）特别兴趣小组心理测量论文的评论

State of the psychometric methods: comments on the ISOQOL SIG psychometric papers.

作者信息

Bjorner Jakob B

机构信息

Optum Patient Insights, Johnston, USA.

Department of Public Health, University of Copenhagen, Copenhagen, Denmark.

出版信息

J Patient Rep Outcomes. 2019 Jul 30;3(1):49. doi: 10.1186/s41687-019-0134-1.

DOI:10.1186/s41687-019-0134-1

PMID:31359221

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6663952/

Abstract

BACKGROUND

Psychometric analyses of patient reported outcomes typically use either classical test theory (CTT), item response theory (IRT), or Rasch measurement theory (RTM). The three papers from the ISOQOL Psychometrics SIG examined the same data set using the tree different approaches. By comparing the results from these papers, the current paper aims to examine the extent to which conclusions about the validity and reliability of a PRO tool depends on the selected psychometric approach.

MAIN TEXT

Regarding the basic statistical model, IRT and RTM are relatively similar but differ notably from CTT. However, modern applications of CTT diminish these differences. In analyses of item discrimination, CTT and IRT gave very similar results, while RTM requires equal discrimination and therefore suggested exclusion of items deviating too much from this requirement. Thus, fewer items fitted the Rasch model. In analyses of item thresholds (difficulty), IRT and RMT provided fairly similar results. Item thresholds are typically not evaluated in CTT. Analyses of local dependence showed only moderate agreement between methods, partly due to different thresholds for important local dependence. Analyses of differential item function (DIF) showed good agreement between IRT and RMT. Agreement might be further improved by adjusting the thresholds for important DIF. Analyses of measurement precision across the score range showed high agreement between IRT and RMT methods. CTT assumes constant measurement precision throughout the score range and thus gave different results. Category orderings were examined in RMT analyses by checking for reversed thresholds. However, this approach is controversial within the RMT society. The same issue can be examined by the nominal categories IRT model.

CONCLUSIONS

While there are well-known differences between CTT, IRT and RMT, the comparison between three actual analyses revealed a great deal of agreement between the results from the methods. If the undogmatic attitude of the three current papers is maintained, the field will be well served.

摘要

背景

对患者报告结局的心理测量分析通常采用经典测验理论（CTT）、项目反应理论（IRT）或拉施测量理论（RTM）。来自国际生活质量研究学会心理测量学特别兴趣小组的三篇论文使用这三种不同方法分析了同一数据集。通过比较这些论文的结果，本文旨在研究关于患者报告结局工具的效度和信度的结论在多大程度上取决于所选的心理测量方法。

正文

关于基本统计模型，IRT和RTM相对相似，但与CTT有显著差异。然而，CTT的现代应用缩小了这些差异。在项目区分度分析中，CTT和IRT得出了非常相似的结果，而RTM要求同等区分度，因此建议排除与该要求偏差过大的项目。因此，符合拉施模型的项目较少。在项目阈值（难度）分析中，IRT和RMT提供了相当相似的结果。CTT通常不评估项目阈值。局部依赖性分析表明，各方法之间只有适度的一致性，部分原因是重要局部依赖性的阈值不同。差异项目功能（DIF）分析表明，IRT和RMT之间有良好的一致性。通过调整重要DIF的阈值，一致性可能会进一步提高。在整个分数范围内的测量精度分析表明，IRT和RMT方法之间有高度一致性。CTT假定在整个分数范围内测量精度恒定，因此得出了不同的结果。在RTM分析中，通过检查阈值是否反转来检验类别排序。然而，这种方法在RTM学界存在争议。同一问题可以通过名义类别IRT模型进行检验。