Department of Psychology, College of Arts and Sciences, University of South Florida, Tampa, Florida 33620-7200, USA.
Med Educ. 2011 Dec;45(12):1181-9. doi: 10.1111/j.1365-2923.2011.04075.x. Epub 2011 Oct 11.
The objective structured clinical examination (OSCE) is comprised of a series of simulations used to assess the skill of medical practitioners in the diagnosis and treatment of patients. It is often used in high-stakes examinations and therefore it is important to assess its reliability and validity.
The published literature was searched (PsycINFO, PubMed) for OSCE reliability estimates (coefficient alpha and generalisability coefficients) computed either across stations or across items within stations. Coders independently recorded information about each study. A meta-analysis of the available literature was computed and sources of systematic variance in estimates were examined.
A total of 188 alpha values from 39 studies were coded. The overall (summary) alpha across stations was 0.66 (95% confidence interval [CI] 0.62-0.70); the overall alpha within stations across items was 0.78 (95% CI 0.73-0.82). Better than average reliability was associated with a greater number of stations and a higher number of examiners per station. Interpersonal skills were evaluated less reliably across stations and more reliably within stations compared with clinical skills.
Overall scores on the OSCE are often not very reliable. It is more difficult to reliably assess communication skills than clinical skills when considering both as general traits that should apply across situations. It is generally helpful to use two examiners and large numbers of stations, but some OSCEs appear more reliable than others for reasons that are not yet fully understood.
客观结构化临床考试(OSCE)由一系列模拟组成,用于评估医学从业者在诊断和治疗患者方面的技能。它通常用于高风险考试,因此评估其可靠性和有效性非常重要。
在 PsycINFO 和 PubMed 上搜索了 OSCE 可靠性估计值(alpha 系数和通用性系数)的已发表文献,这些估计值是在站间或站间项目内计算得出的。编码员独立记录了有关每项研究的信息。对可用文献进行了荟萃分析,并检查了估计值中系统方差的来源。
从 39 项研究中编码了 188 个 alpha 值。站间的总体(汇总)alpha 值为 0.66(95%置信区间[CI] 0.62-0.70);站间项目间的总体 alpha 值为 0.78(95%CI 0.73-0.82)。与站间的可靠性相比,具有更多站和每个站更多考官的考试具有更好的平均可靠性。与临床技能相比,人际技能在站间评估的可靠性较差,而在站间评估的可靠性较高。
OSCE 的总体分数通常不太可靠。与临床技能相比,当考虑到作为适用于各种情况的一般特征时,评估沟通技能更具挑战性。通常,使用两名考官和大量站点很有帮助,但某些 OSCE 的可靠性似乎高于其他 OSCE,原因尚不完全清楚。