Kadar Nicholas
Member of New Jersey Bar, Cranbury, New Jersey, USA.
J Laparoendosc Adv Surg Tech A. 2010 Mar;20(2):123-8. doi: 10.1089/lap.2009.0345.
The aim of this study was to determine if peer review conducted under real-world conditions is systematically biased.
A repeated-measures design was effectively created when two board-certified obstetrician-gynecologists reviewed the same 26 medical records of patients treated by the same physician, and provided written evaluations of each case and a summary of their criticisms. The reviews were conducted independently for two different, unaffiliated hospitals. Neither reviewer was aware of the other's review, and neither was affiliated with either hospital or knew the physician under review. This study reports the degree of agreement between the two reviewers over the care rendered to these 26 patients.
Three of the 26 cases reviewed had complications. Both reviewers criticized these cases, but criticized 2 of them for different reasons. At least one of the reviewers criticized 14 (61%) of the 23 uncomplicated cases, about which no quality concerns had been raised prior to the review. With one exception, they criticized completely different cases and criticized this 1 case for different reasons. Thus, only 4 of the 17 cases criticized by at least one of the reviewers were criticized by both of them, and only 1 of the 4 cases were criticized for the same reason. The Kappa statistic was -0.024, indicating no agreement between the reviewers (P = 0.98).
As presently conducted, peer review can be systematically biased even when conducted independently by external reviewers. Dual-process theory of reasoning can account for the bias and predicts how the bias may potentially be eliminated or reduced.
本研究旨在确定在实际情况下进行的同行评议是否存在系统性偏差。
当两位获得董事会认证的妇产科医生审阅同一位医生治疗的26份相同病历,并对每个病例提供书面评估及批评总结时,有效地创建了重复测量设计。评议在两家不同的、无关联的医院独立进行。两位评议者都不知道对方的评议情况,且均与两家医院无关联,也不认识被评议的医生。本研究报告了两位评议者对这26例患者治疗情况的一致程度。
在审阅的26例病例中,有3例出现并发症。两位评议者都批评了这些病例,但对其中2例的批评原因不同。在23例无并发症的病例中,至少有一位评议者批评了14例(61%),在评议之前没有人对这些病例的质量提出担忧。除了1例例外,他们批评的是完全不同的病例,且对这1例的批评原因也不同。因此,在至少一位评议者批评的17例病例中,只有4例被两位评议者都批评了,并且在这4例中只有1例的批评原因相同。Kappa统计量为-0.024,表明评议者之间没有一致性(P = 0.98)。
就目前的情况而言,即使由外部评议者独立进行,同行评议也可能存在系统性偏差。推理的双过程理论可以解释这种偏差,并预测如何可能消除或减少这种偏差。