Neumann M, Friedl S, Meining A, Egger K, Heldwein W, Rey J F, Hochberger J, Classen M, Hohenberger W, Rösch T
Department of Surgery and Internal Medicine I, University of Erlangen, Germany.
Z Gastroenterol. 2002 Oct;40(10):857-62. doi: 10.1055/s-2002-35258.
In most European countries, training in GI endoscopy has largely been based on hands-on acquisition of experience in patients rather than on a structured training programme. With the development of training models systematic hands-on training in a variety of diagnostic and therapeutic endoscopy techniques was achieved. Little, however, is known about methods of objectively assessing trainees' performance. We therefore developed an assessment 'score card' for upper GI endoscopy and tested it in endoscopists with various levels of experience. The aim of the study was therefore to assess interobserver variations in the evaluation of trainees.
On the basis of textbook and expert opinions a consensus group of eight experienced endoscopists developed a score card for diagnostic upper GI endoscopy with biopsy. The score card includes an assessment of the single steps of the procedure as well as of the times needed to complete each step. This score card was then evaluated in a further conference including ten experts who blindly assessed videotapes of 15 endoscopists performing upper GI endoscopy in a training bio-simulation model (the 'Erlangen Endo-Trainer'). On the basis of their previous experience (i. e. the number of endoscopies performed) these 15 endoscopists were classified into four groups: very experienced, experienced, having some experience and inexperienced. Interobserver variability (IOV) was tested for the various score card parameters (Kendall's rank-correlation coefficient 0.0-0.5 poor, 0.5-1.0 good agreement). In addition, the correlation between the score card assessment and the examiners' experience levels was analysed.
Despite poor IOV results for all the parameters tested (Kendall coefficient < 0.3), the assessment parameters correlated well when the examiners' different experience levels were taken into account (correlation coefficient 0.59-0.89, p < 0.05). The score card parameters were suitable for differentiating between the four groups of examiners with different levels of endoscopic experience.
As expected with scores involving subjective assessment of performance, the variability between reviewers was substantial. Nevertheless, the assessment score was capable of distinguishing reliably between different experience levels in terms of a good individual observer consistency. The score card can therefore be used to document both training status and progress during endoscopy training courses using bio-simulation models, and this might be able to provide improved quality assurance in GI endoscopy training.
在大多数欧洲国家,胃肠内镜检查培训很大程度上基于在患者身上实际获取经验,而非结构化培训项目。随着培训模式的发展,实现了对各种诊断和治疗内镜技术的系统性实践培训。然而,对于客观评估受训者表现的方法知之甚少。因此,我们开发了一种上消化道内镜检查评估“计分卡”,并在不同经验水平的内镜医师中进行了测试。本研究的目的是评估观察者之间在评估受训者时的差异。
基于教科书和专家意见,由八位经验丰富的内镜医师组成的共识小组制定了一份用于诊断性上消化道内镜检查及活检的计分卡。计分卡包括对操作各个步骤的评估以及完成每个步骤所需的时间。然后,在另一次会议上,由十位专家对该计分卡进行评估,这些专家对15位内镜医师在培训生物模拟模型(“埃尔朗根内镜训练器”)中进行上消化道内镜检查的录像进行盲评。根据他们之前的经验(即进行内镜检查的次数)将这15位内镜医师分为四组:经验非常丰富、经验丰富、有一定经验和经验不足。对计分卡的各种参数进行观察者间变异性(IOV)测试(肯德尔等级相关系数0.0 - 0.5为一致性差,0.5 - 1.0为一致性好)。此外,分析了计分卡评估与检查者经验水平之间的相关性。
尽管所有测试参数的IOV结果都较差(肯德尔系数<0.3),但在考虑检查者不同经验水平时,评估参数之间具有良好的相关性(相关系数0.59 - 0.89,p < 0.05)。计分卡参数适用于区分具有不同内镜经验水平的四组检查者。
正如涉及对表现进行主观评估的分数所预期的那样,评审者之间的变异性很大。然而,评估分数能够在个体观察者一致性良好的情况下可靠地区分不同的经验水平。因此,计分卡可用于记录使用生物模拟模型的内镜检查培训课程中的培训状态和进展情况,这可能有助于提高胃肠内镜检查培训的质量保证。