• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

量表分离信度:在比较判断的背景下它意味着什么?

Scale Separation Reliability: What Does It Mean in the Context of Comparative Judgment?

作者信息

Verhavert San, De Maeyer Sven, Donche Vincent, Coertjens Liesje

机构信息

University of Antwerp, Belgium.

Université Catholique de Louvain, Louvain-la-Neuve, Belgium.

出版信息

Appl Psychol Meas. 2018 Sep;42(6):428-445. doi: 10.1177/0146621617748321. Epub 2017 Dec 31.

DOI:10.1177/0146621617748321
PMID:30787486
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6373854/
Abstract

Comparative judgment (CJ) is an alternative method for assessing competences based on Thurstone's law of comparative judgment. Assessors are asked to compare pairs of students work (representations) and judge which one is better on a certain competence. These judgments are analyzed using the Bradly-Terry-Luce model resulting in logit estimates for the representations. In this context, the Scale Separation Reliability (SSR), coming from Rasch modeling, is typically used as reliability measure. But, to the knowledge of the authors, it has never been systematically investigated if the meaning of the SSR can be transferred from Rasch to CJ. As the meaning of the reliability is an important question for both assessment theory and practice, the current study looks into this. A meta-analysis is performed on 26 CJ assessments. For every assessment, split-halves are performed based on assessor. The rank orders of the whole assessment and the halves are correlated and compared with SSR values using Bland-Altman plots. The correlation between the halves of an assessment was compared with the SSR of the whole assessment showing that the SSR is a good measure for split-half reliability. Comparing the SSR of one of the halves with the correlation between the two respective halves showed that the SSR can also be interpreted as an interrater correlation. Regarding SSR as expressing a correlation with the truth, the results are mixed.

摘要

比较判断(CJ)是一种基于瑟斯顿比较判断定律来评估能力的替代方法。评估者被要求比较成对的学生作品(表现),并判断哪一个在某一能力方面更好。使用布拉德利 - 特里 - 卢斯模型对这些判断进行分析,得出表现的对数估计值。在这种情况下,源自拉施模型的量表分离信度(SSR)通常被用作信度度量。但是,据作者所知,SSR的含义是否可以从拉施模型转移到CJ模型从未得到系统研究。由于信度的含义对于评估理论和实践都是一个重要问题,当前的研究对此进行了探讨。对26项CJ评估进行了荟萃分析。对于每项评估,基于评估者进行对半拆分。将整个评估与各半部分的排名顺序进行关联,并使用布兰德 - 奥特曼图与SSR值进行比较。评估各半部分之间的相关性与整个评估的SSR进行比较,结果表明SSR是对半信度的良好度量。将其中一个半部分的SSR与两个相应半部分之间的相关性进行比较,结果表明SSR也可以解释为评分者间的相关性。将SSR视为与真实情况的相关性,结果喜忧参半。

相似文献

1
Scale Separation Reliability: What Does It Mean in the Context of Comparative Judgment?量表分离信度:在比较判断的背景下它意味着什么?
Appl Psychol Meas. 2018 Sep;42(6):428-445. doi: 10.1177/0146621617748321. Epub 2017 Dec 31.
2
Exploring assessor cognition as a source of score variability in a performance assessment of practice-based competencies.探讨评估者认知作为基于实践能力表现评估中评分变异性的一个来源。
BMC Med Educ. 2020 May 25;20(1):168. doi: 10.1186/s12909-020-02077-6.
3
Improving Self-Reflection Assessment Practices: Comparative Judgment as an Alternative to Rubrics.提高自我反思评估实践:比较判断替代评分量表。
Teach Learn Med. 2021 Oct-Dec;33(5):525-535. doi: 10.1080/10401334.2021.1877709. Epub 2021 Feb 11.
4
Assessing the surgical skills of trainees in the operating theatre: a prospective observational study of the methodology.评估手术室受训者的手术技能:一种前瞻性观察研究方法。
Health Technol Assess. 2011 Jan;15(1):i-xxi, 1-162. doi: 10.3310/hta15010.
5
A Law of Comparative Preference: Distinctions Between Models of Personal Preference and Impersonal Judgment in Pair Comparison Designs.比较偏好定律:成对比较设计中个人偏好模型与客观判断模型之间的区别
Appl Psychol Meas. 2019 May;43(3):181-194. doi: 10.1177/0146621617738014. Epub 2017 Nov 2.
6
How IRT can solve problems of ipsative data in forced-choice questionnaires.IRT 如何解决多选题问卷中自比数据的问题。
Psychol Methods. 2013 Mar;18(1):36-52. doi: 10.1037/a0030641. Epub 2012 Nov 12.
7
Fitting a Thurstonian IRT model to forced-choice data using Mplus.使用 Mplus 拟合迫选数据的 Thurstonian IRT 模型。
Behav Res Methods. 2012 Dec;44(4):1135-47. doi: 10.3758/s13428-012-0217-x.
8
Nursing students' clinical judgment skills in simulation and clinical placement: a comparison of student self-assessment and evaluator assessment.护理专业学生在模拟情境和临床实习中的临床判断技能:学生自我评估与评估者评估的比较
BMC Nurs. 2023 Mar 9;22(1):64. doi: 10.1186/s12912-023-01220-0.
9
Adaptive Comparative Judgment: A Tool to Support Students' Assessment Literacy.适应性比较判断:一种支持学生评估素养的工具。
J Vet Med Educ. 2017 Winter;44(4):686-691. doi: 10.3138/jvme.0616-113R. Epub 2017 Jun 5.
10
The interrater reliability and agreement of a 0 to 10 uterine tone score in cesarean delivery.剖宫产术中 0 至 10 分子宫紧张度评分的观察者间可靠性和一致性。
Am J Obstet Gynecol MFM. 2021 May;3(3):100342. doi: 10.1016/j.ajogmf.2021.100342. Epub 2021 Feb 27.

引用本文的文献

1
Comparative judgement as a research tool: A meta-analysis of application and reliability.作为一种研究工具的比较判断:应用与可靠性的元分析
Behav Res Methods. 2025 Jul 10;57(8):222. doi: 10.3758/s13428-025-02744-w.
2
Psychometric Properties of the Persian Translation of the 12-Item Utah Photophobia Symptom Impact Scale Questionnaire.12项犹他畏光症状影响量表问卷波斯语翻译版的心理测量学特性
Neuroophthalmology. 2024 Feb 7;48(4):249-256. doi: 10.1080/01658107.2024.2305812. eCollection 2024.
3
Validating a forced-choice method for eliciting quality-of-reasoning judgments.验证一种用于引出推理质量判断的强制选择方法。
Behav Res Methods. 2024 Aug;56(5):4958-4973. doi: 10.3758/s13428-023-02234-x. Epub 2023 Oct 13.

本文引用的文献

1
An overview on assessing agreement with continuous measurements.关于评估连续测量一致性的概述。
J Biopharm Stat. 2007;17(4):529-69. doi: 10.1080/10543400701376480.
2
Controversy and the Rasch model: a characteristic of incompatible paradigms?争议与拉施模型:不相容范式的一个特征?
Med Care. 2004 Jan;42(1 Suppl):I7-16. doi: 10.1097/01.mlr.0000103528.48582.7c.
3
Psychophysical analysis. By L. L. Thurstone, 1927.心理物理学分析。作者:L.L. 瑟斯顿,1927年。
Am J Psychol. 1987 Fall-Winter;100(3-4):587-609.
4
Statistical methods for assessing agreement between two methods of clinical measurement.评估两种临床测量方法之间一致性的统计方法。
Lancet. 1986 Feb 8;1(8476):307-10.