Cicchetti D V
VA Medical Center, West Haven, CT 06516.
J Clin Exp Neuropsychol. 1988 Oct;10(5):605-22. doi: 10.1080/01688638808402799.
Two paradoxes can occur when neuropsychologists attempt to assess the reliability of a dichotomous diagnostic instrument (e.g., one measuring the presence or absence of Dyslexia or Autism). The first paradox occurs when two pairs of examiners both produce the same high level of agreement (e.g., 85%). Nonetheless, the level of chance-corrected agreement is relatively high for one pair (e.g., 70) and quite low for the other (e.g., .32). To illustrate the second paradox, consider two examiners who are in 80% agreement in their overall diagnosis of Dyslexia. Assume, further, that they are in 100% agreement in the proportion of cases they both diagnose as Dyslexic (20%) and as Non-Dyslexic (80%). Somewhat paradoxically, the level of chance-corrected interexaminer agreement for this pair of examiners calculates to only .37. In distinct contrast, a second set of examiners also in 80% overall agreement, is in appreciable disagreement with respect to diagnostic assignments. Thus, the first neuropsychologist: (a) classifies 65% of the cases as Non-Dyslexic, as opposed to 45% so diagnosed by the second neuropsychologist; and (b) classifies the remaining 35% as Dyslexic, as compared to the 55% so classified by the second examiner. Despite these phenomena, this second pair of examiners produces a much higher level of chance-corrected agreement than did the first pair, that is, a value of .61. The underlying reasons for both of these paradoxes, as well as their resolution, are presented.
当神经心理学家试图评估二分诊断工具(例如,一种测量诵读困难或自闭症是否存在的工具)的可靠性时,可能会出现两种悖论。第一种悖论发生在两对考官都产生相同的高度一致性(例如,85%)时。然而,一对考官的机会校正一致性水平相对较高(例如,70),而另一对则相当低(例如,0.32)。为了说明第二种悖论,假设有两位考官在诵读困难的总体诊断上有80%的一致性。进一步假设,他们在双方都诊断为诵读困难的病例比例(20%)和非诵读困难的病例比例(80%)上有100%的一致性。有点自相矛盾的是,这两位考官的机会校正考官间一致性水平经计算仅为0.37。截然不同的是,另一组考官总体一致性也为80%,但在诊断分配上存在明显分歧。因此,第一位神经心理学家:(a)将65%的病例归类为非诵读困难,而第二位神经心理学家诊断为非诵读困难的比例为45%;(b)将其余35%归类为诵读困难,而第二位考官归类为诵读困难的比例为55%。尽管存在这些现象,但这第二对考官产生的机会校正一致性水平比第一对高得多,即0.61。本文阐述了这两种悖论的根本原因及其解决方法。