混合格式执照考试的综合信度、分类一致性和分类准确性评估方法。

Methods for Evaluating Composite Reliability, Classification Consistency, and Classification Accuracy for Mixed-Format Licensure Tests.

作者信息

Moses Tim, Kim Sooyeon

机构信息

College Board, Newtown, PA, USA.

Educational Testing Service, Princeton, NJ, USA.

出版信息

Appl Psychol Meas. 2015 Jun;39(4):314-329. doi: 10.1177/0146621614563067. Epub 2014 Dec 22.

DOI:10.1177/0146621614563067

PMID:29881011

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5978540/

Abstract

The purpose of this study was to propose extensions of reliability estimation methods that could be used to determine the conditions under which single scoring for constructed-response () items is as effective as double scoring in mixed-format licensure tests. Multivariate generalizability theory methods traditionally used to estimate overall composite score reliability were extended with simulations so that classification consistency and classification accuracy estimates could also be obtained. Composite score reliabilities, classification consistencies, and accuracies were estimated based on the double and single scoring of the items of three licensure tests. Composite score reliabilities, classification consistencies, and accuracies were also estimated in decision studies considering varied testing situations such as different numbers of items and different section weights.

摘要

本研究的目的是提出可靠性估计方法的扩展，这些方法可用于确定在何种条件下，建构反应（CR）项目的单次评分在混合格式执照考试中与双次评分一样有效。传统上用于估计总体综合分数可靠性的多变量概化理论方法通过模拟进行了扩展，以便也能获得分类一致性和分类准确性估计。基于三项执照考试中CR项目的双次评分和单次评分，估计了综合分数可靠性、分类一致性和准确性。在考虑不同测试情况（如不同数量的CR项目和不同的CR部分权重）的决策研究中，也估计了综合分数可靠性、分类一致性和准确性。

相似文献

Methods for Evaluating Composite Reliability, Classification Consistency, and Classification Accuracy for Mixed-Format Licensure Tests.

Appl Psychol Meas. 2015 Jun;39(4):314-329. doi: 10.1177/0146621614563067. Epub 2014 Dec 22.

Inter-rater reliability and generalizability of patient note scores using a scoring rubric based on the USMLE Step-2 CS format.

Adv Health Sci Educ Theory Pract. 2016 Oct;21(4):761-73. doi: 10.1007/s10459-015-9664-3. Epub 2016 Jan 12.

Classification Accuracy of Mixed Format Tests: A Bi-Factor Item Response Theory Approach.

Front Psychol. 2016 Feb 29;7:270. doi: 10.3389/fpsyg.2016.00270. eCollection 2016.

Extended Multivariate Generalizability Theory With Complex Design Structures.

Educ Psychol Meas. 2022 Aug;82(4):617-642. doi: 10.1177/00131644211049746. Epub 2021 Nov 14.

Composite undergraduate clinical examinations: how should the components be combined to maximize reliability?

Med Educ. 2001 Apr;35(4):326-30. doi: 10.1046/j.1365-2923.2001.00929.x.

Constructing licensure exams: a reliability study of case-based questions on the National Board Dental Hygiene Examination.

J Dent Educ. 2013 Dec;77(12):1588-92.

Differential Weighting for Subcomponent Measures of Integrated Clinical Encounter Scores Based on the USMLE Step 2 CS Examination: Effects on Composite Score Reliability and Pass-Fail Decisions.

Acad Med. 2016 Nov;91(11 Association of American Medical Colleges Learn Serve Lead: Proceedings of the 55th Annual Research in Medical Education Sessions):S24-S30. doi: 10.1097/ACM.0000000000001359.

Using generalizability theory for the estimation of reliability of a patient classification system.

J Nurs Meas. 1994 Summer;2(1):49-62.

Weighting checklist items and station components on a large-scale OSCE: is it worth the effort?

Med Teach. 2014 Jul;36(7):585-90. doi: 10.3109/0142159X.2014.899687. Epub 2014 May 2.

The Effects of Rating Designs on Rater Classification Accuracy and Rater Measurement Precision in Large-Scale Mixed-Format Assessments.

Appl Psychol Meas. 2023 Mar;47(2):91-105. doi: 10.1177/01466216231151705. Epub 2023 Jan 12.