University of Louisville.
J Appl Behav Anal. 1979 Winter;12(4):523-33. doi: 10.1901/jaba.1979.12-523.
Interval by interval reliability has been criticized for "inflating" observer agreement when target behavior rates are very low or very high. Scored interval reliability and its converse, unscored interval reliability, however, vary as target behavior rates vary when observer disagreement rates are constant. These problems, along with the existence of "chance" values of each reliability which also vary as a function of response rate, may cause researchers and consumers difficulty in interpreting observer agreement measures. Because each of these reliabilities essentially compares observer disagreements to a different base, it is suggested that the disagreement rate itself be the first measure of agreement examined, and its magnitude relative to occurrence and to nonoccurrence agreements then be considered. This is easily done via a graphic presentation of the disagreement range as a bandwidth around reported rates of target behavior. Such a graphic presentation summarizes all the information collected during reliability assessments and permits visual determination of each of the three reliabilities. In addition, graphing the "chance" disagreement range around the bandwidth permits easy determination of whether or not true observer agreement has likely been demonstrated. Finally, the limits of the disagreement bandwidth help assess the believability of claimed experimental effects: those leaving no overlap between disagreement ranges are probably believable, others are not.
区间可靠性曾因“夸大”观察者一致性而受到批评,尤其是在目标行为率非常低或非常高的情况下。然而,当观察者不一致率保持不变时,评分区间可靠性及其逆概念无评分区间可靠性会随目标行为率的变化而变化。这些问题,以及每种可靠性存在“机会”值的情况,这些机会值也随反应率的函数而变化,可能会导致研究人员和消费者在解释观察者一致性度量时遇到困难。由于每种可靠性本质上都是将观察者的不一致性与不同的基础进行比较,因此建议首先检查不一致率本身作为一致的第一个度量,然后考虑其相对于发生和未发生协议的大小。这可以通过在报告的目标行为率周围以带宽形式显示不一致范围的图形表示轻松完成。这种图形表示总结了可靠性评估期间收集的所有信息,并允许直观地确定三种可靠性中的每一种。此外,在带宽周围绘制“机会”不一致范围可以轻松确定是否可能已经证明了真正的观察者一致性。最后,不一致带宽的限制有助于评估声称的实验效果的可信度:那些在不一致范围之间没有重叠的效果可能是可信的,其他的则不然。