• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用大语言模型解读无BI-RADS的乳腺MRI报告:利用ChatGPT从叙述性报告中进行自动BI-RADS分类

Interpreting BI-RADS-Free Breast MRI Reports Using a Large Language Model: Automated BI-RADS Classification From Narrative Reports Using ChatGPT.

作者信息

Tekcan Sanli Deniz Esin, Sanli Ahmet Necati, Ozmen Gizem, Ozmen Aycil, Cihan Irem, Kurt Atakan, Esmerer Emel

机构信息

Department of Radiology, Faculty of Medicine, Gaziantep University, Gaziantep, Turkey (D.E.T.S., G.O., A.O., I.C., A.K.).

Department of General Surgery, Abdulkadir Yüksel State Hospital, Gaziantep, Turkey (A.N.S.).

出版信息

Acad Radiol. 2025 Sep 6. doi: 10.1016/j.acra.2025.08.026.

DOI:10.1016/j.acra.2025.08.026
PMID:40915935
Abstract

PURPOSE

This study aimed to evaluate the performance of ChatGPT (GPT-4o) in interpreting free-text breast magnetic resonance imaging (MRI) reports by assigning BI-RADS categories and recommending appropriate clinical management steps in the absence of explicitly stated BI-RADS classifications.

METHODS

In this retrospective, single-center study, a total of 352 documented full-text breast MRI reports of at least one identifiable breast lesion with descriptive imaging findings between January 2024 and June 2025 were included in the study. Incomplete reports due to technical limitations, reports describing only normal findings, and MRI examinations performed at external institutions were excluded from the study. First, it was aimed to assess ChatGPT's ability to infer the correct BI-RADS category (2-3-4a-4b-4c-5 separately) based solely on the narrative imaging findings. Second, it was evaluated the model's ability to distinguish between benign versus suspicious/malignant imaging features in terms of clinical decision-making. Therefore, BI-RADS 2-3 categories were grouped as "benign," and BI-RADS 4-5 as "suspicious/malignant," in alignment with how BI-RADS categories are used to guide patient management, rather than to represent definitive diagnostic outcomes. Reports originally containing the term "BI-RADS" were manually de-identified by removing BI-RADS categories and clinical recommendations. Each narrative report was then processed through ChatGPT using two standardized prompts as follows: (1) What is the most appropriate BI-RADS category based on the findings in the report? (2) What should be the next clinical step (e.g., follow-up, biopsy)? Responses were evaluated in real time by two experienced breast radiologists, and consensus was used as the reference standard.

RESULTS

ChatGPT demonstrated moderate agreement with radiologists' consensus for BI-RADS classification (Cohen's Kappa (κ): 0.510, p<0.001). Classification accuracy was highest for BI-RADS 5 reports (77.9%), whereas lower agreement was observed in intermediate categories such as BI-RADS 3 (52.4% correct) and 4B (29.4% correct). In the binary classification of reports as benign or malignant, ChatGPT achieved almost perfect agreement (κ: 0.843), correctly identifying 91.7% of benign and 93.2% of malignant reports. Notably, the model's management recommendations were 100% consistent with its assigned BI-RADS categories, advising biopsy for all BI-RADS 4-5 cases and short-interval follow-up or conditional biopsy for BI-RADS 3 reports.

CONCLUSION

ChatGPT accurately interprets unstructured breast MRI reports, particularly in benign/malignant discrimination and corresponding clinical recommendations. This technology holds potential as a decision support tool to standardize reporting and enhance clinical workflows, especially in settings with variable reporting practices. Prospective, multi-institutional studies are needed for further validation.

摘要

目的

本研究旨在评估ChatGPT(GPT - 4o)在解读自由文本乳腺磁共振成像(MRI)报告方面的表现,即在未明确给出BI - RADS分类的情况下,对乳腺病变进行BI - RADS分类并推荐适当的临床管理步骤。

方法

在这项回顾性单中心研究中,纳入了2024年1月至2025年6月期间352份记录完整的乳腺MRI报告,每份报告至少有一个可识别的乳腺病变且带有描述性影像表现。因技术限制导致的不完整报告、仅描述正常表现的报告以及在外部机构进行的MRI检查被排除在研究之外。首先,旨在评估ChatGPT仅根据叙述性影像表现推断正确BI - RADS分类(分别为2 - 3 - 4a - 4b - 4c - 5)的能力。其次,在临床决策方面评估该模型区分良性与可疑/恶性影像特征的能力。因此,按照BI - RADS分类用于指导患者管理的方式,将BI - RADS 2 - 3类归为“良性”,BI - RADS 4 - 5类归为“可疑/恶性”,而非代表确定性诊断结果。最初包含“BI - RADS”一词的报告通过去除BI - RADS分类和临床建议进行人工去识别。然后,每份叙述性报告通过ChatGPT使用以下两个标准化提示进行处理:(1)根据报告中的发现,最合适的BI - RADS分类是什么?(2)接下来的临床步骤应该是什么(例如,随访、活检)?两名经验丰富的乳腺放射科医生实时评估回复,并将达成的共识用作参考标准。

结果

ChatGPT在BI - RADS分类方面与放射科医生的共识显示出中等程度的一致性(Cohen's Kappa(κ):0.51),p<0.001)。BI - RADS 5类报告的分类准确率最高(77.9%),而在诸如BI - RADS 3(正确52.4%)和4B(正确29.4%)等中间类别中一致性较低。在将报告分为良性或恶性的二元分类中,ChatGPT达成了几乎完美的一致性(κ:0.843),正确识别了91.7%的良性报告和93.2%的恶性报告。值得注意的是,该模型的管理建议与其指定的BI - RADS分类100%一致,建议所有BI - RADS 4 - 5类病例进行活检,BI - RADS 3类报告进行短期随访或有条件活检。

结论

ChatGPT能够准确解读非结构化的乳腺MRI报告,尤其是在良性/恶性鉴别及相应临床建议方面。该技术作为一种决策支持工具具有潜力,可规范报告并优化临床工作流程,特别是在报告实践存在差异的环境中。需要进行前瞻性、多机构研究以进一步验证。

相似文献

1
Interpreting BI-RADS-Free Breast MRI Reports Using a Large Language Model: Automated BI-RADS Classification From Narrative Reports Using ChatGPT.使用大语言模型解读无BI-RADS的乳腺MRI报告:利用ChatGPT从叙述性报告中进行自动BI-RADS分类
Acad Radiol. 2025 Sep 6. doi: 10.1016/j.acra.2025.08.026.
2
Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study.使用大语言模型进行乳腺影像报告和数据系统分类及恶性肿瘤预测以增强乳腺超声诊断:回顾性研究
JMIR Med Inform. 2025 Jun 11;13:e70924. doi: 10.2196/70924.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
The agreement of phonetic transcriptions between paediatric speech and language therapists transcribing a disordered speech sample.儿科言语和语言治疗师转写语音样本的音标转录的一致性。
Int J Lang Commun Disord. 2024 Sep-Oct;59(5):1981-1995. doi: 10.1111/1460-6984.13043. Epub 2024 Jun 8.
5
Contrast-enhanced ultrasound using SonoVue® (sulphur hexafluoride microbubbles) compared with contrast-enhanced computed tomography and contrast-enhanced magnetic resonance imaging for the characterisation of focal liver lesions and detection of liver metastases: a systematic review and cost-effectiveness analysis.超声造影使用声诺维®(六氟化硫微泡)与对比增强计算机断层扫描和对比增强磁共振成像在局灶性肝脏病变的特征描述和肝转移检测中的比较:系统评价和成本效益分析。
Health Technol Assess. 2013 Apr;17(16):1-243. doi: 10.3310/hta17160.
6
GPT-4o and Specialized AI in Breast Ultrasound Imaging: A comparative Study on Accuracy, Agreement, Limitations, and Diagnostic Potential.GPT-4o与乳腺超声成像中的专业人工智能:准确性、一致性、局限性及诊断潜力的比较研究
J Ultrasound Med. 2025 Jun 23. doi: 10.1002/jum.16749.
7
The Diagnostic Accuracy of an Abbreviated vs. a Full MRI Breast Protocol in Detecting Breast Lobular Carcinoma: A Single-Center ROC Study.简化版与完整版乳腺MRI方案在检测乳腺小叶癌中的诊断准确性:一项单中心ROC研究
Diagnostics (Basel). 2025 Jun 12;15(12):1497. doi: 10.3390/diagnostics15121497.
8
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
9
Variation within and between digital pathology and light microscopy for the diagnosis of histopathology slides: blinded crossover comparison study.数字病理学与光学显微镜检查在组织病理学切片诊断中的内部及相互间差异:双盲交叉对比研究
Health Technol Assess. 2025 Jul;29(30):1-75. doi: 10.3310/SPLK4325.
10
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.