Emergency Medical Care Program, Western Carolina University, Cullowhee, North Carolina 28723, USA.
Prehosp Emerg Care. 2012 Apr-Jun;16(2):277-83. doi: 10.3109/10903127.2011.640413. Epub 2012 Jan 9.
Prior to graduation, paramedic students must be assessed for terminal competency and preparedness for national credentialing examinations. Although the procedures for determining competency vary, many academic programs use a practical and/or oral examination, often scored using skill sheets, for evaluating psychomotor skills. However, even with validated testing instruments, the interevaluator reliability of this process is unknown. Objective. We sought to estimate the interevaluator reliability of a subset of paramedic skills as commonly applied in terminal competency testing.
A mock examinee was videotaped performing staged examinations mimicking adult ventilatory management, oral board, and static and dynamic cardiac stations during which the examinee committed a series of prespecified errors. The videotaped performances were then evaluated by a group of qualified evaluators using standardized skill sheets. Interevaluator variability was measured by standard deviation and range, and reliability was evaluated using Krippendorff's alpha. Correlation between scores and evaluator demographics was assessed by Pearson correlation.
Total scores and critical errors varied considerably across all evaluators and stations. The mean (± standard deviation) scores were 24.77 (±2.37) out of a possible 27 points for the adult ventilatory management station, 11.69 (±2.71) out of a possible 15 points for the oral board station, 7.79 (±3.05) out of a possible 12 points for the static cardiology station, and 22.08 (±1.46) out of a possible 24 points for the dynamic cardiology station. Scores ranged from 18 to 27 for adult ventilatory management, 7 to 15 for the oral board, 2 to 12 for static cardiology, and 19 to 24 for dynamic cardiology. Krippendorff's alpha coefficients were 0.30 for adult ventilatory management, 0.01 for the oral board, 0.10 for static cardiology, and 0.48 for dynamic cardiology. Critical criteria errors were assigned by 10 (38.5%) evaluators for adult ventilatory management, five (19.2%) for the oral board, and nine (34.6%) for dynamic cardiology. Total scores were not correlated with evaluator demographics.
There was high variability and low reliability among qualified evaluators using skill sheets as a scoring tool in the evaluation of a mock terminal competency assessment. Further research is needed to determine the true overall interevaluator reliability of this commonly used approach, as well as the ideal number, training, and characteristics of prospective evaluators.
在毕业前,护理人员学生必须通过终端能力评估并准备参加国家认证考试。尽管确定能力的程序有所不同,但许多学术项目都使用实践和/或口头考试,通常使用技能表对心理运动技能进行评估。然而,即使使用经过验证的测试工具,此过程的评估者间可靠性也未知。目的。我们试图估计通常在终端能力测试中应用的一组护理技能的评估者间可靠性。
模拟考生被录像,在录像中模拟成人通气管理、口腔委员会以及静态和动态心脏站的阶段考试,在此期间考生犯了一系列预定的错误。录像表演随后由一组合格的评估者使用标准化技能表进行评估。通过标准差和范围来衡量评估者间的变异性,通过克里普多夫 alpha 评估可靠性。通过 Pearson 相关性评估分数与评估者人口统计学之间的相关性。
所有评估者和站的总得分和关键错误差异很大。成人通气管理站的平均(±标准差)得分为 27 分中的 24.77 分(±2.37 分),口腔委员会站的平均得分为 15 分中的 11.69 分(±2.71 分),静态心脏病学站的平均得分为 12 分中的 7.79 分(±3.05 分),动态心脏病学站的平均得分为 24 分中的 22.08 分(±1.46 分)。成人通气管理的分数范围为 18 到 27,口腔委员会的分数范围为 7 到 15,静态心脏病学的分数范围为 2 到 12,动态心脏病学的分数范围为 19 到 24。成人通气管理的克里普多夫 alpha 系数为 0.30,口腔委员会为 0.01,静态心脏病学为 0.10,动态心脏病学为 0.48。10 名(38.5%)评估者为成人通气管理、5 名(19.2%)评估者为口腔委员会和 9 名(34.6%)评估者为动态心脏病学分配了关键错误标准。总分数与评估者人口统计学无关。
使用技能表作为评分工具评估模拟终端能力评估时,合格评估者之间存在高度的变异性和低可靠性。需要进一步研究以确定这种常用方法的真实评估者间总体可靠性,以及理想的评估者数量、培训和特征。