Bindels Rianne, Hasman Arie, van Wersch Jan W J, Pop Peter, Winkens Ron A G
Department of Medical Informatics, University of Mastricht, The Netherlands.
Med Decis Making. 2003 Jan-Feb;23(1):31-7. doi: 10.1177/0272989X02239647.
Despite a poor reliability, peer assessment is the traditional method to assess the appropriateness of health care activities. This article describes the reliability of the human assessment of the appropriateness of diagnostic tests requests. The authors used a random selection of 1217 tests from 253 request forms submitted by general practitioners in the Maastricht region of The Netherlands. Three reviewers independently assessed the appropriateness of each requested test. Interrater kappa values ranged from 0.33 to 0.42, and kappa values of intrarater agreement ranged from 0.48 to 0.68. The joint reliability coefficient of the 3 reviewers was 0.66. This reliability is sufficient to review test ordering over a series of cases but is not sufficient to make case-by-case assessments. Sixteen reviewers are needed to obtain a joint reliability of 0.95. The authors conclude that there is substantial variation in assessment concerning what is an appropriately requested diagnostic test and that this feedback method is not reliable enough to make a case-by-case assessment. Computer support maybe beneficial to support and make the process of peer review more uniform.
尽管同行评估的可靠性较差,但它仍是评估医疗保健活动适当性的传统方法。本文描述了对诊断检查申请适当性进行人工评估的可靠性。作者从荷兰马斯特里赫特地区全科医生提交的253份申请表中随机选取了1217项检查。三位评审员独立评估每项申请检查的适当性。评审员间的kappa值在0.33至0.42之间,评审员内部一致性的kappa值在0.48至0.68之间。三位评审员的联合可靠性系数为0.66。这种可靠性足以对一系列病例的检查医嘱进行审查,但不足以进行逐案评估。需要16位评审员才能获得0.95的联合可靠性。作者得出结论,在什么是适当的诊断检查申请方面,评估存在很大差异,并且这种反馈方法不够可靠,无法进行逐案评估。计算机支持可能有助于支持并使同行评审过程更加统一。