Suppr超能文献

一个分数够吗?人工智能严重程度评分的陷阱与解决方案。

Is a score enough? Pitfalls and solutions for AI severity scores.

作者信息

Bernstein Michael H, van Assen Marly, Bruno Michael A, Krupinski Elizabeth A, De Cecco Carlo, Baird Grayson L

机构信息

Department of Diagnostic Imaging, Brown Radiology Human Factors Lab, Rhode Island Hospital, Warren Alpert School of Medicine of Brown University, Providence, RI, USA.

Department of Radiology and Imaging Sciences, Emory University, School of Medicine, Atlanta, GA, USA.

出版信息

Eur Radiol Exp. 2025 Jul 14;9(1):67. doi: 10.1186/s41747-025-00603-z.

Abstract

Severity scores, which often refer to the likelihood or probability of a pathology, are commonly provided by artificial intelligence (AI) tools in radiology. However, little attention has been given to the use of these AI scores, and there is a lack of transparency into how they are generated. In this comment, we draw on key principles from psychological science and statistics to elucidate six human factors limitations of AI scores that undermine their utility: (1) variability across AI systems; (2) variability within AI systems; (3) variability between radiologists; (4) variability within radiologists; (5) unknown distribution of AI scores; and (6) perceptual challenges. We hypothesize that these limitations can be mitigated by providing the false discovery rate and false omission rate for each score as a threshold. We discuss how this hypothesis could be empirically tested. KEY POINTS: The radiologist-AI interaction has not been given sufficient attention. The utility of AI scores is limited by six key human factors limitations. We propose a hypothesis for how to mitigate these limitations by using false discovery rate and false omission rate.

摘要

严重程度评分通常指某种病理情况发生的可能性,在放射学领域,人工智能(AI)工具常提供此类评分。然而,人们对这些AI评分的使用关注甚少,且对于其生成方式缺乏透明度。在本评论中,我们借鉴心理学和统计学的关键原则,阐明AI评分在六个方面存在的人为因素限制,这些限制削弱了其效用:(1)AI系统之间的变异性;(2)AI系统内部的变异性;(3)放射科医生之间的变异性;(4)放射科医生内部的变异性;(5)AI评分的分布情况未知;(6)感知挑战。我们假设,通过为每个评分提供错误发现率和错误遗漏率作为阈值,可以减轻这些限制。我们讨论了如何对这一假设进行实证检验。要点:放射科医生与AI的交互作用尚未得到充分关注。AI评分的效用受到六个关键人为因素限制。我们提出了一个关于如何通过使用错误发现率和错误遗漏率来减轻这些限制的假设。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验