Intelligence, Influence and Collaboration Section, Toronto Research Centre, Defence Research and Development Canada, Toronto, Ontario, Canada.
Department of National Defence, Toronto, Ontario, Canada.
PLoS One. 2021 Mar 18;16(3):e0248424. doi: 10.1371/journal.pone.0248424. eCollection 2021.
Across a wide range of domains, experts make probabilistic judgments under conditions of uncertainty to support decision-making. These judgments are often conveyed using linguistic expressions (e.g., x is likely). Seeking to foster shared understanding of these expressions between senders and receivers, the US intelligence community implemented a communication standard that prescribes a set of probability terms and assigns each term an equivalent numerical probability range. In an earlier PLOS ONE article, [1] tested whether access to the standard improves shared understanding and also explored the efficacy of various enhanced presentation formats. Notably, they found that embedding numeric equivalents in text (e.g., x is likely [55-80%]) substantially outperformed the status-quo approach in terms of the percentage overlap between participants' interpretations of linguistic probabilities (defined in terms of the numeric range equivalents they provided for each term) and the numeric ranges in the standard. These results have important prescriptive implications, yet Wintle et al.'s percentage overlap measure of agreement may be viewed as unfairly punitive because it penalizes individuals for being more precise than the stipulated guidelines even when the individuals' interpretations fall perfectly within the stipulated ranges. Arguably, subjects' within-range precision is a positive attribute and should not be penalized in scoring interpretive agreement. Accordingly, in the present article, we reanalyzed Wintle et al.'s data using an alternative measure of percentage overlap that does not penalize in-range precision. Using the alternative measure, we find that percentage overlap is substantially elevated across conditions. More importantly, however, the effects of presentation format and probability level are highly consistent with the original study. By removing the ambiguity caused by Wintle et al.'s unduly punitive measure of agreement, these findings buttress Wintle et al.'s original claim that the methods currently used by intelligence organizations are ineffective at coordinating the meaning of uncertainty expressions between intelligence producers and intelligence consumers. Future studies examining agreement between senders and receivers are also encouraged to reflect carefully on the most appropriate measures of agreement to employ in their experiments and to explicate the bases for their methodological choices.
在广泛的领域中,专家在不确定条件下做出概率判断以支持决策。这些判断通常使用语言表达来传达(例如,x 很可能)。为了促进发送者和接收者之间对这些表达的共同理解,美国情报界实施了一种沟通标准,规定了一组概率术语,并为每个术语分配了等效的数值概率范围。在之前的 PLOS ONE 文章中,[1] 测试了使用标准是否可以提高共同理解,并探讨了各种增强呈现格式的效果。值得注意的是,他们发现将数值等价物嵌入文本中(例如,x 很可能[55-80%])在参与者对语言概率的解释(根据他们为每个术语提供的数值范围等价物来定义)与标准中的数值范围之间的百分比重叠方面,明显优于现状方法。这些结果具有重要的规范性意义,但 Wintle 等人的协议百分比重叠度量可能被视为不公平的惩罚,因为即使个人的解释完全在规定范围内,它也会因个人比规定准则更精确而惩罚个人。可以说,个人在范围内的精确性是一个积极的属性,在评分解释一致性时不应受到惩罚。因此,在本文中,我们使用不惩罚范围内精度的替代百分比重叠度量标准重新分析了 Wintle 等人的数据。使用替代度量标准,我们发现条件之间的百分比重叠大大提高。更重要的是,然而,呈现格式和概率水平的影响与原始研究高度一致。通过消除 Wintle 等人对协议的不恰当惩罚性度量所引起的歧义,这些发现支持了 Wintle 等人的原始观点,即情报组织目前使用的方法在协调情报生产者和情报消费者之间的不确定性表达的含义方面是无效的。未来研究还鼓励仔细考虑在实验中使用最适当的协议度量标准,并阐明他们方法选择的基础。