Korb Kevin B, Nyberg Erik P, Oshni Alvandi Abraham, Thakur Shreshth, Ozmen Mehmet, Li Yang, Pearson Ross, Nicholson Ann E
Faculty of Information Technology, Monash University, Melbourne, VIC, Australia.
Department of Economics, University of Melbourne, Melbourne, VIC, Australia.
Front Psychol. 2020 Jun 18;11:1054. doi: 10.3389/fpsyg.2020.01054. eCollection 2020.
US intelligence analysts must weigh up relevant evidence to assess the probability of their conclusions, and express this reasoning clearly in written reports for decision-makers. Typically, they work alone with no special analytic tools, and sometimes succumb to common probabilistic and causal reasoning errors. So, the US government funded a major research program (CREATE) for four large academic teams to develop new structured, collaborative, software-based methods that might achieve better results. Our team's method (BARD) is the first to combine two key techniques: constructing causal Bayesian network models (BNs) to represent analyst knowledge, and small-group collaboration via the Delphi technique. BARD also incorporates compressed, high-quality online training allowing novices to use it, and checklist-inspired report templates with a rudimentary AI tool for generating text explanations from analysts' BNs. In two prior experiments, our team showed BARD's BN-building assists probabilistic reasoning when used by individuals, with a large effect (Glass' Δ 0.8) (Cruz et al., 2020), and even minimal Delphi-style interactions improve the BN structures individuals produce, with medium to very large effects (Glass' Δ 0.5-1.3) (Bolger et al., 2020). This experiment is the critical test of BARD as an integrated system and possible alternative to business-as-usual for intelligence analysis. Participants were asked to solve three probabilistic reasoning problems spread over 5 weeks, developed by our team to test both quantitative accuracy and susceptibility to tempting qualitative fallacies. Our 256 participants were randomly assigned to form 25 teams of 6-9 using BARD and 58 individuals using Google Suite and (if desired) the best pen-and-paper techniques. For each problem, BARD outperformed this control with very large to huge effects (Glass' Δ 1.4-2.2), greatly exceeding CREATE's initial target. We conclude that, for suitable problems, BARD already offers significant advantages over both business-as-usual and existing BN software. Our effect sizes also suggest BARD's BN-building and collaboration combined beneficially and cumulatively, although implementation differences decreased performances compared to Cruz et al. (2020), so interaction may have contributed. BARD has enormous potential for further development and testing of specific components and on more complex problems, and many potential applications beyond intelligence analysis.
美国情报分析师必须权衡相关证据,以评估其结论的可能性,并在为决策者撰写的报告中清晰地阐述这一推理过程。通常情况下,他们独立工作,没有特殊的分析工具,有时会陷入常见的概率和因果推理错误。因此,美国政府资助了一个大型研究项目(CREATE),让四个大型学术团队开发新的结构化、协作式、基于软件的方法,以期获得更好的结果。我们团队的方法(BARD)首次结合了两项关键技术:构建因果贝叶斯网络模型(BNs)来表示分析师的知识,以及通过德尔菲技术进行小组协作。BARD还包含压缩的高质量在线培训,使新手也能使用,以及受清单启发的报告模板,并配有一个初级人工智能工具,用于根据分析师的贝叶斯网络生成文本解释。在之前的两项实验中,我们团队表明,BARD的贝叶斯网络构建在个人使用时有助于概率推理,效果显著(格拉斯效应量Δ为0.8)(克鲁兹等人,2020年),即使是最少的德尔菲式互动也能改善个人生成的贝叶斯网络结构,效果从中等到非常大(格拉斯效应量Δ为0.5 - 1.3)(博尔格等人,2020年)。本次实验是对BARD作为一个集成系统以及情报分析常规方法可能替代方案的关键测试。参与者被要求在5周内解决我们团队设计的三个概率推理问题,这些问题旨在测试定量准确性以及对诱人的定性谬误的易感性。我们的256名参与者被随机分配,组成25个由6 - 9人组成的团队使用BARD,另外58人使用谷歌套件以及(如有需要)最佳的纸笔技术。对于每个问题,BARD的表现均远超这个对照组,效果非常大到极其显著(格拉斯效应量Δ为1.4 - 2.2),大大超出了CREATE的初始目标。我们得出结论,对于合适的问题,BARD已经比常规方法和现有的贝叶斯网络软件具有显著优势。我们的效应量还表明,BARD的贝叶斯网络构建和协作相结合产生了有益的累积效果,尽管与克鲁兹等人(2020年)相比,实施差异导致了性能下降,所以互动可能起到了作用。BARD在特定组件的进一步开发和测试以及处理更复杂问题方面具有巨大潜力,并且在情报分析之外还有许多潜在应用。