Kreptul Dennis, Thomas Roger E
a Department of Family Medicine, Health Sciences Centre , University of Calgary , Calgary , Canada.
Educ Prim Care. 2016 Nov;27(6):471-477. doi: 10.1080/14739879.2016.1205835. Epub 2016 Jul 13.
Family Medicine trainees are often assessed in Objective Structured Clinical Examinations (OSCEs). The purpose of this survey is to document the quality in terms of psychometrics and standard setting of OSCEs as used in Family Practice (FP)/General Practice (GP) training programs.
Nine electronic data bases were searched from inception to December 2015 and included articles were searched in the PubMed single citation matcher. Two authors independently assessed all titles/abstracts/full texts and abstracted data. Articles were searched for OSCEs used for performance assessment of FP/GP trainees.
Twenty-one studies were identified which met our criteria published between 1987 and 2014. Content validity was reported in 18, construct validity in nine, and criterion (concurrent and/or predictive) validity in five. Five articles considered the consequences of testing. Internal reliability was reported by 12 studies, inter-rater reliability by seven, generalisability by four. Nine set pass-fail standards of which four were by criterion standards. In addition, we tabulated sources of validity and reliability as with particular reference to medical education.
We found few articles which vigorously provided evidence of validity and reliability. Standard-setting, when done, was normative in all high stakes exams. OSCEs used for formative purposes had lower psychometric standards.
家庭医学实习生常常要接受客观结构化临床考试(OSCE)的评估。本次调查旨在记录家庭医疗(FP)/全科医疗(GP)培训项目中所使用的OSCE在心理测量学及标准设定方面的质量。
检索了9个电子数据库自建库至2015年12月的文献,并在PubMed单篇文献匹配器中检索纳入的文章。两位作者独立评估所有标题/摘要/全文并提取数据。检索用于FP/GP实习生绩效评估的OSCE相关文章。
共识别出21项符合我们标准的研究,发表时间为1987年至2014年。18项研究报告了内容效度,9项报告了结构效度,5项报告了标准(同时性和/或预测性)效度。5篇文章考虑了测试的后果。12项研究报告了内部信度,7项报告了评分者间信度,4项报告了可推广性。9项研究设定了及格/不及格标准,其中4项采用标准参照标准。此外,我们还列出了效度和信度的来源,特别参考了医学教育方面的内容。
我们发现很少有文章有力地提供了效度和信度的证据。在所有高风险考试中,标准设定都是规范性的。用于形成性目的的OSCE在心理测量学标准方面较低。