• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

探索大语言模型在乳腺肿瘤患者记录的分类、临床解读及治疗推荐中的应用。

Exploring the use of large language models for classification, clinical interpretation, and treatment recommendation in breast tumor patient records.

作者信息

Miao Beibei, Sun Qian, Wang Peien, Shao Rongjun, Ding Yingying, Chen Yuanlong, Ying Rongbiao

机构信息

Department of Thyroid and Breast Surgery, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), No. 50, Zhenxin Road, Xinhe Town, Wenling, 317502, Taizhou, China.

Department of Interventional and Minimally Invasive Surgery, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), No. 50, Zhenxin Road, Xinhe Town, Wenling, 317502, Taizhou, China.

出版信息

Sci Rep. 2025 Aug 26;15(1):31450. doi: 10.1038/s41598-025-16999-y.

DOI:10.1038/s41598-025-16999-y
PMID:40858846
Abstract

This study aims to investigate and compare the diagnostic performance, disease interpretation reliability, and treatment recommendation capabilities of multiple advanced large language models (GPT-4o, DeepSeek-R1, and DeepSeek-V3) in breast tumor cases. It retrospectively collected comprehensive clinical records of patients with breast tumors treated at Taizhou Cancer Hospital between January and April 2024. The study evaluated the accuracy of tumor classification (benign vs. malignant), the quality of disease interpretation, and the appropriateness of treatment recommendations generated by each model. To assess the clinical interpretability and utility of the models, a comprehensive performance analysis was conducted using statistical methods. A total of 45 patients with breast tumors were included, comprising 37 benign and 8 malignant cases (43 females, 2 males). GPT-4o achieved the highest area under the curve (AUC) for tumor classification (AUC = 0.848), outperforming DeepSeek-R1 (AUC = 0.736) and DeepSeek-V3 (AUC = 0.723). However, DeLong's test indicated that the differences in AUCs among the models were not statistically significant (p > 0.05). In addition, subjective evaluations by doctors indicated that DeepSeek-R1 received the highest scores for disease interpretation (4.73 ± 0.46) and treatment recommendations (4.70 ± 0.51), with consistent ratings.

摘要

本研究旨在调查和比较多种先进的大语言模型(GPT-4o、DeepSeek-R1和DeepSeek-V3)在乳腺肿瘤病例中的诊断性能、疾病解读可靠性和治疗建议能力。它回顾性收集了2024年1月至4月在台州肿瘤医院接受治疗的乳腺肿瘤患者的综合临床记录。该研究评估了肿瘤分类(良性与恶性)的准确性、疾病解读的质量以及每个模型生成的治疗建议的适宜性。为了评估模型的临床可解释性和实用性,使用统计方法进行了全面的性能分析。总共纳入了45例乳腺肿瘤患者,包括37例良性病例和8例恶性病例(43名女性,2名男性)。GPT-4o在肿瘤分类方面获得了最高的曲线下面积(AUC)(AUC = 0.848),优于DeepSeek-R1(AUC = 0.736)和DeepSeek-V3(AUC = 0.723)。然而,DeLong检验表明模型之间AUC的差异无统计学意义(p > 0.05)。此外,医生的主观评价表明,DeepSeek-R1在疾病解读(4.73 ± 0.46)和治疗建议(4.70 ± 0.51)方面得分最高,评分一致。

相似文献

1
Exploring the use of large language models for classification, clinical interpretation, and treatment recommendation in breast tumor patient records.探索大语言模型在乳腺肿瘤患者记录的分类、临床解读及治疗推荐中的应用。
Sci Rep. 2025 Aug 26;15(1):31450. doi: 10.1038/s41598-025-16999-y.
2
Diagnostic performance of newly developed large language models in critical illness cases: A comparative study.新开发的大语言模型在危重症病例中的诊断性能:一项比较研究。
Int J Med Inform. 2025 Dec;204:106088. doi: 10.1016/j.ijmedinf.2025.106088. Epub 2025 Aug 23.
3
Assessing the Role of Large Language Models Between ChatGPT and DeepSeek in Asthma Education for Bilingual Individuals: Comparative Study.评估ChatGPT和DeepSeek之间的大型语言模型在双语个体哮喘教育中的作用:比较研究
JMIR Med Inform. 2025 Aug 13;13:e65365. doi: 10.2196/65365.
4
Using a Diverse Test Suite to Assess Large Language Models on Fast Health Care Interoperability Resources Knowledge: Comparative Analysis.使用多样化测试套件在快速医疗保健互操作性资源知识方面评估大语言模型:比较分析
J Med Internet Res. 2025 Aug 12;27:e73540. doi: 10.2196/73540.
5
Clinical feasibility of AI Doctors: Evaluating the replacement potential of large language models in outpatient settings for central nervous system tumors.人工智能医生的临床可行性:评估大语言模型在中枢神经系统肿瘤门诊环境中的替代潜力。
Int J Med Inform. 2025 Jun 12;203:106013. doi: 10.1016/j.ijmedinf.2025.106013.
6
Evaluating ChatGPT and DeepSeek in postdural puncture headache management: a comparative study with international consensus guidelines.评估ChatGPT和DeepSeek在硬膜穿刺后头痛管理中的应用:与国际共识指南的对比研究
BMC Neurol. 2025 Jul 1;25(1):264. doi: 10.1186/s12883-025-04280-8.
7
A multi-dimensional performance evaluation of large language models in dental implantology: comparison of ChatGPT, DeepSeek, Grok, Gemini and Qwen across diverse clinical scenarios.牙种植学中大型语言模型的多维性能评估:ChatGPT、百川智能、Grok、Gemini和通义千问在不同临床场景下的比较
BMC Oral Health. 2025 Jul 28;25(1):1272. doi: 10.1186/s12903-025-06619-6.
8
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
9
Evaluating the Reasoning Capabilities of Large Language Models for Medical Coding and Hospital Readmission Risk Stratification: Zero-Shot Prompting Approach.评估大型语言模型在医学编码和医院再入院风险分层方面的推理能力:零样本提示方法。
J Med Internet Res. 2025 Jul 30;27:e74142. doi: 10.2196/74142.
10
Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.ChatGPT-4o与四个开源大语言模型基于中国罕见病目录生成诊断的性能:比较研究
J Med Internet Res. 2025 Jun 18;27:e69929. doi: 10.2196/69929.

本文引用的文献

1
GRACE-ICU: A multimodal nomogram-based approach for illness severity assessment of older adults in the ICU.GRACE-ICU:一种基于多模态列线图的重症监护病房老年患者疾病严重程度评估方法。
NPJ Digit Med. 2025 Aug 13;8(1):519. doi: 10.1038/s41746-025-01875-w.
2
Breast cancer germline multigene panel testing in mainstream oncology based on clinical-public health utility: ESMO Precision Oncology Working Group recommendations.基于临床-公共卫生效用的主流肿瘤学中乳腺癌种系多基因检测:欧洲肿瘤内科学会精准肿瘤学工作组建议
Ann Oncol. 2025 Aug;36(8):853-865. doi: 10.1016/j.annonc.2025.04.012. Epub 2025 Jun 15.
3
BPI25-012: Developing an Artificial Intelligence Tool for Personalized Breast Cancer Treatment Plans Based on the NCCN Guidelines.
BPI25 - 012:基于美国国立综合癌症网络(NCCN)指南开发用于个性化乳腺癌治疗方案的人工智能工具。
J Natl Compr Canc Netw. 2025 Mar 28;23(3.5):BPI25-012. doi: 10.6004/jnccn.2024.7135.
4
Multimodal recurrence risk prediction model for HR+/HER2- early breast cancer following adjuvant chemo-endocrine therapy: integrating pathology image and clinicalpathological features.辅助化疗-内分泌治疗后HR+/HER2-早期乳腺癌的多模态复发风险预测模型:整合病理图像和临床病理特征
Breast Cancer Res. 2025 Mar 28;27(1):27. doi: 10.1186/s13058-025-01968-0.
5
Evaluating AI-generated patient education materials for spinal surgeries: Comparative analysis of readability and DISCERN quality across ChatGPT and deepseek models.评估用于脊柱手术的人工智能生成的患者教育材料:ChatGPT和DeepSeek模型之间可读性和DISCERN质量的比较分析。
Int J Med Inform. 2025 Jun;198:105871. doi: 10.1016/j.ijmedinf.2025.105871. Epub 2025 Mar 13.
6
China made waves with Deepseek, but its real ambition is AI-driven industrial innovation.中国凭借深势科技引起了轰动,但其真正的雄心是由人工智能驱动的产业创新。
Nature. 2025 Feb;638(8051):609-611. doi: 10.1038/d41586-025-00460-1.
7
The use of large language models in detecting Chinese ultrasound report errors.大语言模型在检测中文超声报告错误中的应用
NPJ Digit Med. 2025 Jan 28;8(1):66. doi: 10.1038/s41746-025-01468-7.
8
The potential of Generative Pre-trained Transformer 4 (GPT-4) to analyse medical notes in three different languages: a retrospective model-evaluation study.生成式预训练变换器4(GPT-4)分析三种不同语言医学笔记的潜力:一项回顾性模型评估研究。
Lancet Digit Health. 2025 Jan;7(1):e35-e43. doi: 10.1016/S2589-7500(24)00246-2.
9
Diagnostic value of 5 miRNAs combined detection for breast cancer.5种微小RNA联合检测对乳腺癌的诊断价值
Front Genet. 2024 Nov 25;15:1482927. doi: 10.3389/fgene.2024.1482927. eCollection 2024.
10
Automated real-world data integration improves cancer outcome prediction.自动化真实世界数据整合可改善癌症预后预测。
Nature. 2024 Dec;636(8043):728-736. doi: 10.1038/s41586-024-08167-5. Epub 2024 Nov 6.