Department of Psychology, Hallym Applied Psychology Institute, College of Social Science, Hallym University, Chuncheon, Korea.
The CAT Korea Company, Chuncheon, Korea.
J Educ Eval Health Prof. 2024;21:18. doi: 10.3352/jeehp.2024.21.18. Epub 2024 Jul 9.
This study aimed to compare and evaluate the efficiency and accuracy of computerized adaptive testing (CAT) under 2 stopping rules (standard error of measurement [SEM]=0.3 and 0.25) using both real and simulated data in medical examinations in Korea.
This study employed post-hoc simulation and real data analysis to explore the optimal stopping rule for CAT in medical examinations. The real data were obtained from the responses of 3rd-year medical students during examinations in 2020 at Hallym University College of Medicine. Simulated data were generated using estimated parameters from a real item bank in R. Outcome variables included the number of examinees’ passing or failing with SEM values of 0.25 and 0.30, the number of items administered, and the correlation. The consistency of real CAT result was evaluated by examining consistency of pass or fail based on a cut score of 0.0. The efficiency of all CAT designs was assessed by comparing the average number of items administered under both stopping rules.
Both SEM 0.25 and SEM 0.30 provided a good balance between accuracy and efficiency in CAT. The real data showed minimal differences in pass/ fail outcomes between the 2 SEM conditions, with a high correlation (r=0.99) between ability estimates. The simulation results confirmed these findings, indicating similar average item numbers between real and simulated data.
The findings suggest that both SEM 0.25 and 0.30 are effective termination criteria in the context of the Rasch model, balancing accuracy and efficiency in CAT.
本研究旨在比较和评估在韩国医学考试中使用两种终止规则(测量标准误差[SEM]=0.3 和 0.25)下的计算机化自适应测试(CAT)的效率和准确性,同时使用真实和模拟数据。
本研究采用事后模拟和真实数据分析来探索医学考试中 CAT 的最佳终止规则。真实数据来自 2020 年翰林大学医学院三年级医学生考试的反应。模拟数据是使用 R 中的真实题库估计参数生成的。结果变量包括 SEM 值为 0.25 和 0.30 的通过或失败的考生人数、施测的项目数和相关性。通过检查基于 0.0 切割分数的通过或失败的一致性,评估真实 CAT 结果的一致性。通过比较两种终止规则下的平均项目数,评估所有 CAT 设计的效率。
SEM 0.25 和 SEM 0.30 均在 CAT 的准确性和效率之间提供了良好的平衡。真实数据显示两种 SEM 条件下的通过/失败结果差异最小,能力估计之间具有高度相关性(r=0.99)。模拟结果证实了这些发现,表明真实和模拟数据之间的平均项目数相似。
这些发现表明,在 Rasch 模型的背景下,SEM 0.25 和 0.30 都是有效的终止标准,在 CAT 中平衡了准确性和效率。