de Graaff E
Educational Development and Research, Rijksuniversiteit Limburg, Maastricht, The Netherlands.
Med Educ. 1989 Jul;23(4):381-6. doi: 10.1111/j.1365-2923.1989.tb01564.x.
There is evidence that nurses fulfil the requirements of objective judgement better than doctors. Simulation of Initial Medical Problem-Solving (SIMP), a paper-and-pencil test for the assessment of medical problem-solving, consists of case histories, followed by an open-ended question. The scoring of open-ended questions is time-consuming and adds subjective bias to measurement error. In order to reduce scoring error, answers on SIMP are scored by means of scoring models in the form of check-lists with descriptions of elements of a correct answer. The reliability of the scoring was analysed in a study, with six nurses rating 500 answers. The overall interrater reliability was high, expressed by an intra-class correlation of 0.83. Selection of raters, and improvement of the scoring models, could increase the interrater reliability even further. In addition to the scoring by the nurses part of the material was scored again by two experienced doctors. The reliability of the scoring method on the whole was confirmed. Nevertheless, some evidence was found of misinterpretation of the scoring models by the nurses. Analysis at the item level revealed several instances in which both doctors agreed on a score for an element in an answer and all the nurses agreed on the opposite score. On the other hand, however, the two doctors were less consistent between themselves than the nurses. The disagreement between the doctors seems to be a consequence of differences in their own medical judgement of the case in question. The impact of the mistakes that are made by the nurses is much smaller than the loss of reliability caused by the inconsistency among the doctors.
有证据表明,护士比医生更能满足客观判断的要求。初始医学问题解决模拟(SIMP)是一种用于评估医学问题解决能力的纸笔测试,包括病例史,随后是一个开放式问题。开放式问题的评分耗时且会给测量误差增加主观偏差。为了减少评分误差,SIMP的答案通过以清单形式呈现的评分模型进行评分,清单中描述了正确答案的要素。在一项研究中分析了评分的可靠性,六名护士对500个答案进行了评分。整体评分者间信度较高,组内相关系数为0.83。选择评分者以及改进评分模型,可以进一步提高评分者间信度。除了由护士进行评分外,部分材料还由两名经验丰富的医生再次评分。整体评分方法的可靠性得到了证实。然而,发现了一些护士对评分模型存在误解的证据。在项目层面的分析显示,在一些情况下,两位医生对答案中一个要素的评分一致,而所有护士的评分却相反。然而,另一方面,两位医生之间的一致性不如护士之间。医生之间的分歧似乎是他们对所讨论病例的医学判断存在差异的结果。护士所犯错误的影响远小于医生之间不一致导致的信度损失。