一个分数够吗？人工智能严重程度评分的陷阱与解决方案。

Is a score enough? Pitfalls and solutions for AI severity scores.

作者信息

Bernstein Michael H, van Assen Marly, Bruno Michael A, Krupinski Elizabeth A, De Cecco Carlo, Baird Grayson L

机构信息

Department of Diagnostic Imaging, Brown Radiology Human Factors Lab, Rhode Island Hospital, Warren Alpert School of Medicine of Brown University, Providence, RI, USA.

Department of Radiology and Imaging Sciences, Emory University, School of Medicine, Atlanta, GA, USA.

出版信息

Eur Radiol Exp. 2025 Jul 14;9(1):67. doi: 10.1186/s41747-025-00603-z.

DOI:10.1186/s41747-025-00603-z

PMID:40658189

Abstract

Severity scores, which often refer to the likelihood or probability of a pathology, are commonly provided by artificial intelligence (AI) tools in radiology. However, little attention has been given to the use of these AI scores, and there is a lack of transparency into how they are generated. In this comment, we draw on key principles from psychological science and statistics to elucidate six human factors limitations of AI scores that undermine their utility: (1) variability across AI systems; (2) variability within AI systems; (3) variability between radiologists; (4) variability within radiologists; (5) unknown distribution of AI scores; and (6) perceptual challenges. We hypothesize that these limitations can be mitigated by providing the false discovery rate and false omission rate for each score as a threshold. We discuss how this hypothesis could be empirically tested. KEY POINTS: The radiologist-AI interaction has not been given sufficient attention. The utility of AI scores is limited by six key human factors limitations. We propose a hypothesis for how to mitigate these limitations by using false discovery rate and false omission rate.

摘要

严重程度评分通常指某种病理情况发生的可能性，在放射学领域，人工智能（AI）工具常提供此类评分。然而，人们对这些AI评分的使用关注甚少，且对于其生成方式缺乏透明度。在本评论中，我们借鉴心理学和统计学的关键原则，阐明AI评分在六个方面存在的人为因素限制，这些限制削弱了其效用：（1）AI系统之间的变异性；（2）AI系统内部的变异性；（3）放射科医生之间的变异性；（4）放射科医生内部的变异性；（5）AI评分的分布情况未知；（6）感知挑战。我们假设，通过为每个评分提供错误发现率和错误遗漏率作为阈值，可以减轻这些限制。我们讨论了如何对这一假设进行实证检验。要点：放射科医生与AI的交互作用尚未得到充分关注。AI评分的效用受到六个关键人为因素限制。我们提出了一个关于如何通过使用错误发现率和错误遗漏率来减轻这些限制的假设。

相似文献

Is a score enough? Pitfalls and solutions for AI severity scores.一个分数够吗？人工智能严重程度评分的陷阱与解决方案。

Eur Radiol Exp. 2025 Jul 14;9(1):67. doi: 10.1186/s41747-025-00603-z.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Gaps in Artificial Intelligence Research for Rural Health in the United States: A Scoping Review.美国农村卫生人工智能研究的差距：一项范围综述

medRxiv. 2025 Jun 27:2025.06.26.25330361. doi: 10.1101/2025.06.26.25330361.

Short-Term Memory Impairment短期记忆障碍

"In a State of Flow": A Qualitative Examination of Autistic Adults' Phenomenological Experiences of Task Immersion.“心流状态”：对自闭症成年人任务沉浸现象学体验的质性研究

Autism Adulthood. 2024 Sep 16;6(3):362-373. doi: 10.1089/aut.2023.0032. eCollection 2024 Sep.

How lived experiences of illness trajectories, burdens of treatment, and social inequalities shape service user and caregiver participation in health and social care: a theory-informed qualitative evidence synthesis.疾病轨迹的生活经历、治疗负担和社会不平等如何影响服务使用者和照顾者参与健康和社会护理：一项基于理论的定性证据综合分析

Health Soc Care Deliv Res. 2025 Jun;13(24):1-120. doi: 10.3310/HGTQ8159.

Automated devices for identifying peripheral arterial disease in people with leg ulceration: an evidence synthesis and cost-effectiveness analysis.用于识别下肢溃疡患者外周动脉疾病的自动化设备：证据综合和成本效益分析。

Health Technol Assess. 2024 Aug;28(37):1-158. doi: 10.3310/TWCG3912.

Can Artificial Intelligence Improve the Readability of Patient Education Materials?人工智能能否提高患者教育材料的可读性？

Clin Orthop Relat Res. 2023 Nov 1;481(11):2260-2267. doi: 10.1097/CORR.0000000000002668. Epub 2023 Apr 28.

Carbon dioxide detection for diagnosis of inadvertent respiratory tract placement of enterogastric tubes in children.用于诊断儿童肠胃管意外置入呼吸道的二氧化碳检测

Cochrane Database Syst Rev. 2025 Feb 19;2(2):CD011196. doi: 10.1002/14651858.CD011196.pub2.

Eliciting adverse effects data from participants in clinical trials.从临床试验参与者中获取不良反应数据。

Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.

本文引用的文献

Use of Artificial Intelligence for Digital Breast Tomosynthesis Screening: A Preliminary Real-world Experience.人工智能在数字乳腺断层合成筛查中的应用：初步的真实世界经验。

J Breast Imaging. 2023 May 22;5(3):258-266. doi: 10.1093/jbi/wbad015.

Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography.人工智能（AI）结果不正确会对放射科医生产生影响吗？如果有影响，我们能做些什么？一项使用胸部 X 线摄影检测肺癌的多读者初步研究。

Eur Radiol. 2023 Nov;33(11):8263-8269. doi: 10.1007/s00330-023-09747-1. Epub 2023 Jun 2.

Association of Artificial Intelligence-Aided Chest Radiograph Interpretation With Reader Performance and Efficiency.人工智能辅助的胸部 X 光片解读与读者表现和效率的关联。

JAMA Netw Open. 2022 Aug 1;5(8):e2229289. doi: 10.1001/jamanetworkopen.2022.29289.

COVID-19 pneumonia chest radiographic severity score: variability assessment among experienced and in-training radiologists and creation of a multireader composite score database for artificial intelligence algorithm development.COVID-19 肺炎胸部放射学严重程度评分：经验丰富的放射科医生和受训放射科医生之间的变异性评估，以及为人工智能算法开发创建多读者综合评分数据库。

Br J Radiol. 2022 Jun 1;95(1134):20211028. doi: 10.1259/bjr.20211028. Epub 2022 May 5.

Improving the Performance of Radiologists Using Artificial Intelligence-Based Detection Support Software for Mammography: A Multi-Reader Study.利用基于人工智能的乳腺 X 线摄影检测支持软件提高放射科医生的工作表现：一项多读者研究。

Korean J Radiol. 2022 May;23(5):505-516. doi: 10.3348/kjr.2021.0476. Epub 2022 Apr 4.

Effect of artificial intelligence-based triaging of breast cancer screening mammograms on cancer detection and radiologist workload: a retrospective simulation study.基于人工智能的乳腺癌筛查钼靶图像分诊对癌症检出率和放射科医生工作量的影响：一项回顾性模拟研究。

Lancet Digit Health. 2020 Sep;2(9):e468-e474. doi: 10.1016/S2589-7500(20)30185-0.

Automated Assessment of COVID-19 Reporting and Data System and Chest CT Severity Scores in Patients Suspected of Having COVID-19 Using Artificial Intelligence.利用人工智能对疑似 COVID-19 患者的 COVID-19 报告和数据系统及胸部 CT 严重程度评分进行自动评估。

Radiology. 2021 Jan;298(1):E18-E28. doi: 10.1148/radiol.2020202439. Epub 2020 Jul 30.

The artificial intelligence-assisted cytology diagnostic system in large-scale cervical cancer screening: A population-based cohort study of 0.7 million women.人工智能辅助细胞学诊断系统在大规模宫颈癌筛查中的应用：一项基于人群的 70 万名妇女的队列研究。

Cancer Med. 2020 Sep;9(18):6896-6906. doi: 10.1002/cam4.3296. Epub 2020 Jul 22.

External validation of a convolutional neural network artificial intelligence tool to predict malignancy in pulmonary nodules.卷积神经网络人工智能工具预测肺结节良恶性的外部验证。

Thorax. 2020 Apr;75(4):306-312. doi: 10.1136/thoraxjnl-2019-214104. Epub 2020 Mar 5.

Inter-reader Variability in the Use of BI-RADS Descriptors for Suspicious Findings on Diagnostic Mammography: A Multi-institution Study of 10 Academic Radiologists.乳腺影像报告和数据系统（BI-RADS）描述符在诊断性乳腺钼靶检查中对可疑发现的使用的阅片者间差异：一项针对10名学术放射科医生的多机构研究

Acad Radiol. 2017 Jan;24(1):60-66. doi: 10.1016/j.acra.2016.09.010. Epub 2016 Oct 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一个分数够吗？人工智能严重程度评分的陷阱与解决方案。

Is a score enough? Pitfalls and solutions for AI severity scores.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献