Cabitza Federico, Campagner Andrea, Sconfienza Luca Maria
Università degli Studi di Milano-Bicocca, Viale Sarca 336, 20126 Milan, Italy.
Department of Biomedical Sciences for Health, University of Milan, Milan, Italy.
Health Inf Sci Syst. 2021 Feb 5;9(1):8. doi: 10.1007/s13755-021-00138-8. eCollection 2021 Dec.
The integration of Artificial Intelligence into medical practices has recently been advocated for the promise to bring increased efficiency and effectiveness to these practices. Nonetheless, little research has so far been aimed at understanding the best human-AI interaction protocols in collaborative tasks, even in currently more viable settings, like independent double-reading screening tasks.
To this aim, we report about a retrospective case-control study, involving 12 board-certified radiologists, in the detection of knee lesions by means of Magnetic Resonance Imaging, in which we simulated the serial combination of two Deep Learning models with humans in eight double-reading protocols. Inspired by the so-called Kasparov's Laws, we investigate whether the combination of humans and AI models could achieve better performance than AI models alone, and whether weak reader, when supported by fit-for-use interaction protocols, could out-perform stronger readers.
We discuss two main findings: groups of humans who perform significantly worse than a state-of-the-art AI can significantly outperform it if their judgements are aggregated by majority voting (in concordance with the first part of the Kasparov's law); small ensembles of significantly weaker readers can significantly outperform teams of stronger readers, supported by the same computational tool, when the judgments of the former ones are combined within "fit-for-use" protocols (in concordance with the second part of the Kasparov's law).
Our study shows that good interaction protocols can guarantee improved decision performance that easily surpasses the performance of individual agents, even of realistic super-human AI systems. This finding highlights the importance of focusing on how to guarantee better co-operation within human-AI teams, so to enable safer and more human sustainable care practices.
近期,人工智能融入医疗实践备受推崇,因其有望提高这些实践的效率和效果。尽管如此,到目前为止,几乎没有研究旨在了解协作任务中最佳的人机交互协议,即使在当前更可行的场景中,如独立双读筛查任务。
为此,我们报告了一项回顾性病例对照研究,该研究涉及12名获得董事会认证的放射科医生,通过磁共振成像检测膝关节病变,我们在8种双读协议中模拟了两种深度学习模型与人类的序列组合。受所谓的卡斯帕罗夫定律启发,我们研究人与人工智能模型的组合是否能比单独的人工智能模型取得更好的性能,以及能力较弱的阅片者在适用的交互协议支持下是否能超越能力较强的阅片者。
我们讨论了两个主要发现:表现明显不如先进人工智能的人类群体,如果通过多数投票汇总他们的判断(与卡斯帕罗夫定律的第一部分一致),其表现会显著优于该人工智能;在“适用”协议内组合能力明显较弱的阅片者的判断时,由相同计算工具支持的能力较弱的阅片者小团队能显著超越能力较强的阅片者团队(与卡斯帕罗夫定律的第二部分一致)。
我们的研究表明,良好的交互协议可以保证决策性能的提升,轻松超越个体智能体的性能,甚至是现实中的超人类人工智能系统。这一发现凸显了关注如何在人机团队中保证更好合作的重要性,以便实现更安全、更具人文可持续性的医疗实践。