Suppr超能文献

针对乳腺超声放射科医生的可解释性ChatGPT辅助诊断的初步实验。

Preliminary experiments on interpretable ChatGPT-assisted diagnosis for breast ultrasound radiologists.

作者信息

Sun Pengfei, Qian Linxue, Wang Zhixiang

机构信息

Department of Ultrasound, Beijing Friendship Hospital, Capital Medical University, Beijing, China.

Department of Medical Imaging, Beijing Friendship Hospital, Capital Medical University, Beijing, China.

出版信息

Quant Imaging Med Surg. 2024 Sep 1;14(9):6601-6612. doi: 10.21037/qims-24-141. Epub 2024 Aug 28.

Abstract

BACKGROUND

Ultrasound is essential for detecting breast lesions. The American College of Radiology's Breast Imaging Reporting and Data System (BI-RADS) classification system is widely used, but its subjectivity can lead to inconsistency in diagnostic outcomes. Artificial intelligence (AI) models, such as ChatGPT-3.5, may potentially enhance diagnostic accuracy and efficiency in medical settings. This study aimed to assess the utility of the ChatGPT-3.5 model in generating BI-RADS classifications for breast ultrasound reports and its ability to replicate the "chain of thought" (CoT) in clinical decision-making to improve model interpretability.

METHODS

Breast ultrasound reports were collected, and ChatGPT-3.5 was used to generate diagnoses and treatment plans. We evaluated GPT-4's performance by comparing its generated reports to those from doctors with various levels of experience. We also conducted a Turing test and a consistency analysis. To enhance the interpretability of the model, we applied the CoT method to deconstruct the decision-making chain of the GPT model.

RESULTS

A total of 131 patients were evaluated, with 57 doctors participating in the experiment. ChatGPT-3.5 showed promising performance in structure and organization (S&O), professional terminology and expression (PTE), treatment recommendations (TR), and clarity and comprehensibility (C&C). However, improvements are needed in BI-RADS classification, malignancy diagnosis (MD), likelihood of being written by a physician (LWBP), and ultrasound doctor artificial intelligence acceptance (UDAIA). Turing test results indicated that AI-generated reports convincingly resembled human-authored reports. Reproducibility experiments displayed consistent performance. Erroneous report analysis revealed issues related to incorrect diagnosis, inconsistencies, and overdiagnosis. The CoT investigation supports the potential of ChatGPT to replicate the clinical decision-making process and offers insights into AI interpretability.

CONCLUSIONS

The ChatGPT-3.5 model holds potential as a valuable tool for assisting in the efficient determination of BI-RADS classifications and enhancing diagnostic performance.

摘要

背景

超声对于检测乳腺病变至关重要。美国放射学会的乳腺影像报告和数据系统(BI-RADS)分类系统被广泛使用,但其主观性可能导致诊断结果不一致。人工智能(AI)模型,如ChatGPT-3.5,可能会提高医疗环境中的诊断准确性和效率。本研究旨在评估ChatGPT-3.5模型在生成乳腺超声报告的BI-RADS分类方面的效用,以及其在临床决策中复制“思维链”(CoT)以提高模型可解释性的能力。

方法

收集乳腺超声报告,并使用ChatGPT-3.5生成诊断和治疗方案。我们通过将其生成的报告与不同经验水平医生的报告进行比较,评估GPT-4的性能。我们还进行了图灵测试和一致性分析。为了提高模型的可解释性,我们应用CoT方法解构GPT模型的决策链。

结果

共评估了131例患者,57名医生参与了实验。ChatGPT-3.5在结构和组织(S&O)、专业术语和表达(PTE)、治疗建议(TR)以及清晰度和可理解性(C&C)方面表现出良好的性能。然而,在BI-RADS分类、恶性肿瘤诊断(MD)、由医生撰写的可能性(LWBP)以及超声医生对人工智能的接受度(UDAIA)方面仍需改进。图灵测试结果表明,人工智能生成的报告与人类撰写的报告极为相似。再现性实验显示性能一致。错误报告分析揭示了与错误诊断、不一致和过度诊断相关的问题。CoT研究支持ChatGPT复制临床决策过程的潜力,并为人工智能可解释性提供了见解。

结论

ChatGPT-3.5模型作为辅助高效确定BI-RADS分类和提高诊断性能的有价值工具具有潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4293/11400651/6dc0a3b71b65/qims-14-09-6601-f1.jpg

相似文献

本文引用的文献

9
Multimodal biomedical AI.多模态生物医学人工智能。
Nat Med. 2022 Sep;28(9):1773-1784. doi: 10.1038/s41591-022-01981-2. Epub 2022 Sep 15.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验