Suppr
超能文献

客观结构化临床考试成绩可靠性的系统评价。

A systematic review of the reliability of objective structured clinical examination scores.

机构信息

Department of Psychology, College of Arts and Sciences, University of South Florida, Tampa, Florida 33620-7200, USA.

出版信息

Med Educ. 2011 Dec;45(12):1181-9. doi: 10.1111/j.1365-2923.2011.04075.x. Epub 2011 Oct 11.

DOI:10.1111/j.1365-2923.2011.04075.x

PMID:21988659

Abstract

CONTEXT

The objective structured clinical examination (OSCE) is comprised of a series of simulations used to assess the skill of medical practitioners in the diagnosis and treatment of patients. It is often used in high-stakes examinations and therefore it is important to assess its reliability and validity.

METHODS

The published literature was searched (PsycINFO, PubMed) for OSCE reliability estimates (coefficient alpha and generalisability coefficients) computed either across stations or across items within stations. Coders independently recorded information about each study. A meta-analysis of the available literature was computed and sources of systematic variance in estimates were examined.

RESULTS

A total of 188 alpha values from 39 studies were coded. The overall (summary) alpha across stations was 0.66 (95% confidence interval [CI] 0.62-0.70); the overall alpha within stations across items was 0.78 (95% CI 0.73-0.82). Better than average reliability was associated with a greater number of stations and a higher number of examiners per station. Interpersonal skills were evaluated less reliably across stations and more reliably within stations compared with clinical skills.

CONCLUSIONS

Overall scores on the OSCE are often not very reliable. It is more difficult to reliably assess communication skills than clinical skills when considering both as general traits that should apply across situations. It is generally helpful to use two examiners and large numbers of stations, but some OSCEs appear more reliable than others for reasons that are not yet fully understood.

摘要

背景

客观结构化临床考试（OSCE）由一系列模拟组成，用于评估医学从业者在诊断和治疗患者方面的技能。它通常用于高风险考试，因此评估其可靠性和有效性非常重要。

方法

在 PsycINFO 和 PubMed 上搜索了 OSCE 可靠性估计值（alpha 系数和通用性系数）的已发表文献，这些估计值是在站间或站间项目内计算得出的。编码员独立记录了有关每项研究的信息。对可用文献进行了荟萃分析，并检查了估计值中系统方差的来源。

结果

从 39 项研究中编码了 188 个 alpha 值。站间的总体（汇总）alpha 值为 0.66（95%置信区间[CI] 0.62-0.70）；站间项目间的总体 alpha 值为 0.78（95%CI 0.73-0.82）。与站间的可靠性相比，具有更多站和每个站更多考官的考试具有更好的平均可靠性。与临床技能相比，人际技能在站间评估的可靠性较差，而在站间评估的可靠性较高。