Yoo Dong-Mi, Han Jae Jin
Department of Medical Education, College of Medicine, The Catholic University of Korea, Seoul, Korea.
Department of Medical Education & Thoracic Surgery, Ewha Womans University College of Medicine, Seoul, Korea.
J Educ Eval Health Prof. 2024;21:39. doi: 10.3352/jeehp.2024.21.39. Epub 2024 Dec 10.
This study aimed to examine the reliability and validity of a measurement tool for portfolio assessments in medical education. Specifically, it investigated scoring consistency among raters and assessment criteria appropriateness according to an expert panel.
A cross-sectional observational study was conducted from September to December 2018 for the Introduction to Clinical Medicine course at the Ewha Womans University College of Medicine. Data were collected for 5 randomly selected portfolios scored by a gold-standard rater and 6 trained raters. An expert panel assessed the validity of 12 assessment items using the content validity index (CVI). Statistical analysis included Pearson correlation coefficients for rater alignment, the intraclass correlation coefficient (ICC) for inter-rater reliability, and the CVI for item-level validity.
Rater 1 had the highest Pearson correlation (0.8916) with the gold-standard rater, while Rater 5 had the lowest (0.4203). The ICC for all raters was 0.3821, improving to 0.4415 after excluding Raters 1 and 5, indicating a 15.6% reliability increase. All assessment items met the CVI threshold of ≥0.75, with some achieving a perfect score (CVI=1.0). However, items like "sources" and "level and degree of performance" showed lower validity (CVI=0.72).
The present measurement tool for portfolio assessments demonstrated moderate reliability and strong validity, supporting its use as a credible tool. For a more reliable portfolio assessment, more faculty training is needed.
本研究旨在检验医学教育中档案袋评估测量工具的信度和效度。具体而言,它根据专家小组调查了评分者之间的评分一致性以及评估标准的适宜性。
2018年9月至12月,针对梨花女子大学医学院的临床医学导论课程进行了一项横断面观察性研究。收集了由一名金标准评分者和6名经过培训的评分者对5个随机选择的档案袋进行评分的数据。一个专家小组使用内容效度指数(CVI)评估了12个评估项目的效度。统计分析包括评分者一致性的Pearson相关系数、评分者间信度的组内相关系数(ICC)以及项目层面效度的CVI。
评分者1与金标准评分者的Pearson相关性最高(0.8916),而评分者5最低(0.4203)。所有评分者的ICC为0.3821,排除评分者1和5后提高到0.4415,表明信度提高了15.6%。所有评估项目均达到CVI阈值≥0.75,有些项目获得了满分(CVI = 1.0)。然而,“来源”和“表现水平和程度”等项目的效度较低(CVI = 0.72)。
目前的档案袋评估测量工具显示出中等信度和较强效度,支持将其用作可靠工具。为了进行更可靠的档案袋评估,需要更多的教师培训。