Suppr超能文献

基于大语言模型的乳腺癌影像结构化报告生成:GPT-4与渊思的对比分析

Structured Report Generation for Breast Cancer Imaging Based on Large Language Modeling: A Comparative Analysis of GPT-4 and DeepSeek.

作者信息

Chen Kun, Hou Xuefeng, Li Xiaofeng, Xu Wengui, Yi Heqing

机构信息

Department of Nuclear Medicine, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, Zhejiang 310022, China (K.C., H.Y.).

Department of Medical Imaging, Tianjin Children's Hospital (Tianjin University Children's Hospital), Tianjin, China (X.H.).

出版信息

Acad Radiol. 2025 Aug 7. doi: 10.1016/j.acra.2025.07.046.

Abstract

RATIONALE AND OBJECTIVES

The purpose of this study is to compare the performance of GPT-4 and DeepSeek large language models in generating structured breast cancer multimodality imaging integrated reports from free-text radiology reports including mammography, ultrasound, MRI, and PET/CT.

MATERIALS AND METHODS

A retrospective analysis was conducted on 1358 free-text reports from 501 breast cancer patients across two institutions. The study design involved synthesizing multimodal imaging data into structured reports with three components: primary lesion characteristics, metastatic lesions, and TNM staging. Input prompts were standardized for both models, with GPT-4 using predesigned instructions and DeepSeek requiring manual input. Reports were evaluated based on physician satisfaction using a Likert scale, descriptive accuracy including lesion localization, size, SUV, and metastasis assessment, and TNM staging correctness according to NCCN guidelines. Statistical analysis included McNemar tests for binary outcomes and correlation analysis for multiclass comparisons with a significance threshold of P < .05.

RESULTS

Physician satisfaction scores showed strong correlation between models with r-values of 0.665 and 0.558 and P-values below .001. Both models demonstrated high accuracy in data extraction and integration. The mean accuracy for primary lesion features was 91.7% for GPT-4% and 92.1% for DeepSeek, while feature synthesis accuracy was 93.4% for GPT4 and 93.9% for DeepSeek. Metastatic lesion identification showed comparable overall accuracy at 93.5% for GPT4 and 94.4% for DeepSeek. GPT-4 performed better in pleural lesion detection with 94.9% accuracy compared to 79.5% for DeepSeek, whereas DeepSeek achieved higher accuracy in mesenteric metastasis identification at 87.5% vs 43.8% for GPT4. TNM staging accuracy exceeded 92% for T-stage and 94% for M-stage, with N-stage accuracy improving beyond 90% when supplemented with physical exam data.

CONCLUSION

Both GPT-4 and DeepSeek effectively generate structured breast cancer imaging reports with high accuracy in data mining, integration, and TNM staging. Integrating these models into clinical practice is expected to enhance report standardization and physician productivity.

摘要

原理与目的

本研究旨在比较GPT-4和百川大语言模型在从包括乳腺X线摄影、超声、MRI和PET/CT的自由文本放射学报告生成结构化乳腺癌多模态成像综合报告方面的性能。

材料与方法

对来自两个机构的501例乳腺癌患者的1358份自由文本报告进行回顾性分析。研究设计包括将多模态成像数据综合成具有三个组成部分的结构化报告:原发灶特征、转移灶和TNM分期。两个模型的输入提示均进行了标准化,GPT-4使用预先设计的指令,百川则需要手动输入。根据医生使用李克特量表的满意度、包括病灶定位、大小、SUV和转移评估在内的描述准确性以及根据NCCN指南的TNM分期正确性对报告进行评估。统计分析包括对二元结果的麦克尼马尔检验和对多分类比较的相关性分析,显著性阈值为P < .05。

结果

医生满意度评分显示模型之间具有很强的相关性,r值分别为0.665和^0.558,P值低于.001。两个模型在数据提取和整合方面均表现出高准确性。GPT-4的原发灶特征平均准确率为91.7%,百川为92.1%,而特征合成准确率GPT-4为93.4%,百川为93.9%。转移灶识别的总体准确率相当,GPT-4为93.5%,百川为94.4%。GPT-4在胸膜病灶检测方面表现更好,准确率为94.9%,而百川为79.5%,而百川在肠系膜转移识别方面准确率更高,为87.5%,而GPT-4为43.8%。TNM分期准确率T期超过92%,M期超过94%,当补充体格检查数据时,N期准确率提高到90%以上。

结论

GPT-4和百川均能有效地生成结构化乳腺癌成像报告,在数据挖掘、整合和TNM分期方面具有高准确性。将这些模型整合到临床实践中有望提高报告标准化程度和医生工作效率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验