Department of Ultrasound, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China; College of Health Science and Technology, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
Department of Ultrasound, Yunnan Kungang Hospital, The Seventh Affiliated Hospital of Dali University, Anning, Yunnan, China.
Ultrasound Med Biol. 2024 Nov;50(11):1697-1703. doi: 10.1016/j.ultrasmedbio.2024.07.007. Epub 2024 Aug 12.
To assess the capabilities of large language models (LLMs), including Open AI (GPT-4.0) and Microsoft Bing (GPT-4), in generating structured reports, the Breast Imaging Reporting and Data System (BI-RADS) categories, and management recommendations from free-text breast ultrasound reports.
In this retrospective study, 100 free-text breast ultrasound reports from patients who underwent surgery between January and May 2023 were gathered. The capabilities of Open AI (GPT-4.0) and Microsoft Bing (GPT-4) to convert these unstructured reports into structured ultrasound reports were studied. The quality of structured reports, BI-RADS categories, and management recommendations generated by GPT-4.0 and Bing were evaluated by senior radiologists based on the guidelines.
Open AI (GPT-4.0) was better than Microsoft Bing (GPT-4) in terms of performance in generating structured reports (88% vs. 55%; p < 0.001), giving correct BI-RADS categories (54% vs. 47%; p = 0.013) and providing reasonable management recommendations (81% vs. 63%; p < 0.001). As the ability to predict benign and malignant characteristics, GPT-4.0 performed significantly better than Bing (AUC, 0.9317 vs. 0.8177; p < 0.001), while both performed significantly inferior to senior radiologists (AUC, 0.9763; both p < 0.001).
This study highlights the potential of LLMs, specifically Open AI (GPT-4.0), in converting unstructured breast ultrasound reports into structured ones, offering accurate diagnoses and providing reasonable recommendations.
评估大型语言模型(LLM),包括 Open AI(GPT-4.0)和 Microsoft Bing(GPT-4),在从自由文本乳腺超声报告生成结构化报告、乳腺影像报告和数据系统(BI-RADS)类别以及管理建议方面的能力。
在这项回顾性研究中,收集了 2023 年 1 月至 5 月期间接受手术的 100 名患者的 100 份自由文本乳腺超声报告。研究了 Open AI(GPT-4.0)和 Microsoft Bing(GPT-4)将这些非结构化报告转换为结构化超声报告的能力。高级放射科医生根据指南评估了 GPT-4.0 和 Bing 生成的结构化报告的质量、BI-RADS 类别和管理建议。
Open AI(GPT-4.0)在生成结构化报告方面的表现优于 Microsoft Bing(GPT-4)(88%比 55%;p<0.001),给出正确的 BI-RADS 类别(54%比 47%;p=0.013)和合理的管理建议(81%比 63%;p<0.001)。在预测良性和恶性特征的能力方面,GPT-4.0 的表现明显优于 Bing(AUC,0.9317 比 0.8177;p<0.001),而两者均明显逊于高级放射科医生(AUC,0.9763;均 p<0.001)。
本研究强调了 LLM,特别是 Open AI(GPT-4.0),在将非结构化乳腺超声报告转换为结构化报告方面的潜力,提供准确的诊断并提供合理的建议。