Schmidt L G, Dirschedl P, Grohmann R, Scherer J, Wunderlich O, Müller-Oerlinghausen B
Eur J Clin Pharmacol. 1986;30(2):199-204. doi: 10.1007/BF00614303.
Within an ongoing drug surveillance project (AMUP) in psychiatric hospitals, a comparative study was carried out to evaluate two methods commonly used in the field of adverse drug reaction assessment. Two raters, who have cooperated with the project since its inception, evaluated 80 randomly selected ADRs twice; first, by an empirical (implicit) approach, and second, 4 weeks later, by using an algorithm as proposed by Kramer et al. 1979. Agreement on medication and related probability ratings was obtained in 81% of all 80 cases for the empirical method (weighted Kappa = 0.41), and in 69% for the algorithmic method (weighted Kappa = 0.62), indicating that agreement exceeded chance for both methods. By comparison with assessments made in previous case conferences of the project, empirical ratings were found to be reliable over time due to homogeneous use of criteria by project raters. In contrast to the reports on the subject, agreement between raters appeared to be superior in the empirical method as compared to the algorithmic assessment. Analysis of disagreements suggested that probability ratings based on the empirical method were nonspecific, due to conventional criteria applied in the project. Inter-rater agreement was reduced by polypharmacy, especially in the case of algorithmic assessments. The consistency of assessment was also lowered by the fact that the 2 methods assigned different weights to particular assessment criteria.
在一项正在进行的精神病医院药物监测项目(AMUP)中,开展了一项比较研究,以评估药物不良反应评估领域常用的两种方法。两名自项目启动以来就参与其中的评估人员对80例随机选取的药物不良反应进行了两次评估;首先采用经验性(隐性)方法,其次在4周后,使用克莱默等人于1979年提出的一种算法进行评估。在所有80例病例中,经验性方法在81%的病例中获得了关于用药及相关概率评级的一致性结果(加权Kappa值 = 0.41),算法性方法的这一比例为69%(加权Kappa值 = 0.62),这表明两种方法的一致性均高于随机水平。与该项目之前病例讨论会上所做的评估相比,由于项目评估人员对标准的统一使用,经验性评级在一段时间内是可靠的。与关于该主题的报告不同,与算法性评估相比,经验性方法中评估人员之间的一致性似乎更高。对分歧的分析表明,由于项目中应用的传统标准,基于经验性方法的概率评级不具有特异性。多种药物联合使用降低了评估人员之间的一致性,尤其是在算法性评估中。两种方法对特定评估标准赋予不同权重这一事实也降低了评估的一致性。