• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估大型语言模型(LLM)在既定乳腺分类系统上的性能。

Evaluating Large Language Model (LLM) Performance on Established Breast Classification Systems.

作者信息

Haider Syed Ali, Pressman Sophia M, Borna Sahar, Gomez-Cabello Cesar A, Sehgal Ajai, Leibovich Bradley C, Forte Antonio Jorge

机构信息

Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA.

Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA.

出版信息

Diagnostics (Basel). 2024 Jul 11;14(14):1491. doi: 10.3390/diagnostics14141491.

DOI:10.3390/diagnostics14141491
PMID:39061628
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11275570/
Abstract

Medical researchers are increasingly utilizing advanced LLMs like ChatGPT-4 and Gemini to enhance diagnostic processes in the medical field. This research focuses on their ability to comprehend and apply complex medical classification systems for breast conditions, which can significantly aid plastic surgeons in making informed decisions for diagnosis and treatment, ultimately leading to improved patient outcomes. Fifty clinical scenarios were created to evaluate the classification accuracy of each LLM across five established breast-related classification systems. Scores from 0 to 2 were assigned to LLM responses to denote incorrect, partially correct, or completely correct classifications. Descriptive statistics were employed to compare the performances of ChatGPT-4 and Gemini. Gemini exhibited superior overall performance, achieving 98% accuracy compared to ChatGPT-4's 71%. While both models performed well in the Baker classification for capsular contracture and UTSW classification for gynecomastia, Gemini consistently outperformed ChatGPT-4 in other systems, such as the Fischer Grade Classification for gender-affirming mastectomy, Kajava Classification for ectopic breast tissue, and Regnault Classification for breast ptosis. With further development, integrating LLMs into plastic surgery practice will likely enhance diagnostic support and decision making.

摘要

医学研究人员越来越多地利用ChatGPT-4和Gemini等先进的大语言模型来改进医学领域的诊断流程。这项研究聚焦于它们理解和应用复杂乳腺疾病分类系统的能力,这能够极大地帮助整形外科医生做出明智的诊断和治疗决策,最终改善患者的治疗效果。创建了50个临床病例场景,以评估每个大语言模型在五个既定乳腺相关分类系统中的分类准确性。大语言模型的回答被赋予0到2分,以表示分类错误、部分正确或完全正确。采用描述性统计来比较ChatGPT-4和Gemini的性能。Gemini展现出更优的整体性能,准确率达到98%,而ChatGPT-4的准确率为71%。虽然两个模型在包膜挛缩的贝克分类和男性乳房发育的UTSW分类中表现良好,但在其他系统中,如性别确认乳房切除术的菲舍尔分级分类、异位乳腺组织的卡亚瓦分类和乳房下垂的雷诺分类中,Gemini始终优于ChatGPT-4。随着进一步发展,将大语言模型整合到整形外科实践中可能会增强诊断支持和决策制定。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b750/11275570/bc443b21358d/diagnostics-14-01491-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b750/11275570/1de0eb3bc026/diagnostics-14-01491-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b750/11275570/bc443b21358d/diagnostics-14-01491-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b750/11275570/1de0eb3bc026/diagnostics-14-01491-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b750/11275570/bc443b21358d/diagnostics-14-01491-g002.jpg

相似文献

1
Evaluating Large Language Model (LLM) Performance on Established Breast Classification Systems.评估大型语言模型(LLM)在既定乳腺分类系统上的性能。
Diagnostics (Basel). 2024 Jul 11;14(14):1491. doi: 10.3390/diagnostics14141491.
2
Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini.大型语言模型在整形手术中的术中决策支持:ChatGPT-4 和 Gemini 的比较。
Medicina (Kaunas). 2024 Jun 8;60(6):957. doi: 10.3390/medicina60060957.
3
Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration.评估ChatGPT-4的诊断准确性:视觉数据整合的影响。
JMIR Med Inform. 2024 Apr 9;12:e55627. doi: 10.2196/55627.
4
AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries.人工智能在手外科中的应用:评估大语言模型在手部损伤分类与管理中的作用
J Clin Med. 2024 May 11;13(10):2832. doi: 10.3390/jcm13102832.
5
Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.分诊表现比较:大型语言模型、ChatGPT 和未经训练的急诊医生:一项对比研究。
J Med Internet Res. 2024 Jun 14;26:e53297. doi: 10.2196/53297.
6
The Role of Large Language Models (LLMs) in Providing Triage for Maxillofacial Trauma Cases: A Preliminary Study.大语言模型在颌面创伤病例分诊中的作用:一项初步研究。
Diagnostics (Basel). 2024 Apr 18;14(8):839. doi: 10.3390/diagnostics14080839.
7
Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy.评估大语言模型(ChatGPT-4、Gemini和Microsoft Copilot)对乳腺成像常见问题的回答:可读性和准确性研究
Cureus. 2024 May 9;16(5):e59960. doi: 10.7759/cureus.59960. eCollection 2024 May.
8
Artificial Intelligence Large Language Models Address Anterior Cruciate Ligament Reconstruction: Superior Clarity and Completeness by Gemini Compared With ChatGPT-4 in Response to American Academy of Orthopaedic Surgeons Clinical Practice Guidelines.人工智能大语言模型助力前交叉韧带重建:与ChatGPT-4相比,Gemini在回应美国矫形外科医师学会临床实践指南时具有更高的清晰度和完整性。
Arthroscopy. 2025 Jun;41(6):2002-2008. doi: 10.1016/j.arthro.2024.09.020. Epub 2024 Sep 21.
9
Can large language models provide accurate and quality information to parents regarding chronic kidney diseases?大语言模型能否为家长提供关于慢性肾脏病的准确、高质量信息?
J Eval Clin Pract. 2024 Dec;30(8):1556-1564. doi: 10.1111/jep.14084. Epub 2024 Jul 3.
10
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断:对流行的大型语言模型的定性研究。
JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.

引用本文的文献

1
Clinical and economic impact of a large language model in perioperative medicine: a randomized crossover trial.大语言模型在围手术期医学中的临床和经济影响:一项随机交叉试验
NPJ Digit Med. 2025 Jul 21;8(1):462. doi: 10.1038/s41746-025-01858-x.
2
Large language models for disease diagnosis: a scoping review.用于疾病诊断的大语言模型:一项范围综述。
NPJ Artif Intell. 2025;1(1):9. doi: 10.1038/s44387-025-00011-z. Epub 2025 Jun 9.
3
Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.

本文引用的文献

1
A narrative review of telemedicine and its adoption across specialties.远程医疗及其在各专业领域应用的叙述性综述。
Mhealth. 2024 Apr 15;10:19. doi: 10.21037/mhealth-23-28. eCollection 2024.
2
Enabling Personalized Medicine in Orthopaedic Surgery Through Artificial Intelligence: A Critical Analysis Review.通过人工智能实现矫形外科的个性化医疗:批判性分析评论。
JBJS Rev. 2024 Mar 11;12(3). doi: e23.00232. eCollection 2024 Mar 1.
3
Large Language Models in Medicine: The Potentials and Pitfalls : A Narrative Review.医学领域的大型语言模型:潜力与陷阱:一篇叙事性综述。
大型语言模型回答临床研究问题的准确性:系统评价与网络荟萃分析
J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.
4
Evaluation of Information About Cardiovascular Implications of Gender-Affirming Care From Online Chat-based Artificial Intelligence Systems.基于在线聊天的人工智能系统对性别肯定治疗的心血管影响信息的评估
CJC Open. 2024 Nov 30;7(3):338-343. doi: 10.1016/j.cjco.2024.11.020. eCollection 2025 Mar.
5
Assessing online chat-based artificial intelligence models for weight loss recommendation appropriateness and bias in the presence of guideline incongruence.在存在指南不一致的情况下,评估基于在线聊天的人工智能模型在减肥建议方面的适当性和偏差。
Int J Obes (Lond). 2025 Jan 27. doi: 10.1038/s41366-025-01717-5.
6
Using Large Language Models to Retrieve Critical Data from Clinical Processes and Business Rules.使用大语言模型从临床流程和业务规则中检索关键数据。
Bioengineering (Basel). 2024 Dec 28;12(1):17. doi: 10.3390/bioengineering12010017.
7
Probabilistic medical predictions of large language models.大语言模型的概率医学预测
NPJ Digit Med. 2024 Dec 19;7(1):367. doi: 10.1038/s41746-024-01366-4.
Ann Intern Med. 2024 Feb;177(2):210-220. doi: 10.7326/M23-2772. Epub 2024 Jan 30.
4
The ability of artificial intelligence tools to formulate orthopaedic clinical decisions in comparison to human clinicians: An analysis of ChatGPT 3.5, ChatGPT 4, and Bard.与人类临床医生相比,人工智能工具制定骨科临床决策的能力:对ChatGPT 3.5、ChatGPT 4和Bard的分析。
J Orthop. 2023 Dec 1;50:1-7. doi: 10.1016/j.jor.2023.11.063. eCollection 2024 Apr.
5
The Impact of Multimodal Large Language Models on Health Care's Future.多模态大型语言模型对医疗保健未来的影响。
J Med Internet Res. 2023 Nov 2;25:e52865. doi: 10.2196/52865.
6
Using Generative Artificial Intelligence Tools in Cosmetic Surgery: A Study on Rhinoplasty, Facelifts, and Blepharoplasty Procedures.在整容手术中使用生成式人工智能工具:一项关于隆鼻术、面部提升术和眼睑成形术的研究。
J Clin Med. 2023 Oct 14;12(20):6524. doi: 10.3390/jcm12206524.
7
The future landscape of large language models in medicine.医学领域大语言模型的未来前景。
Commun Med (Lond). 2023 Oct 10;3(1):141. doi: 10.1038/s43856-023-00370-1.
8
A New Clinical Classification for Gynecomastia Management and Predictive Outcome.一种用于男性乳腺增生症管理和预测结果的新临床分类
Indian J Plast Surg. 2023 Jul 28;56(4):332-337. doi: 10.1055/s-0043-1770963. eCollection 2023 Aug.
9
Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment.ChatGPT 和 Bard 在基于文本的放射学知识评估中的比较性能。
Can Assoc Radiol J. 2024 May;75(2):344-350. doi: 10.1177/08465371231193716. Epub 2023 Aug 14.
10
The Role of Large Language Models in Medical Education: Applications and Implications.大语言模型在医学教育中的作用:应用与启示
JMIR Med Educ. 2023 Aug 14;9:e50945. doi: 10.2196/50945.