文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

评估大型语言模型(LLM)在既定乳腺分类系统上的性能。

Evaluating Large Language Model (LLM) Performance on Established Breast Classification Systems.

作者信息

Haider Syed Ali, Pressman Sophia M, Borna Sahar, Gomez-Cabello Cesar A, Sehgal Ajai, Leibovich Bradley C, Forte Antonio Jorge

机构信息

Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA.

Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA.

出版信息

Diagnostics (Basel). 2024 Jul 11;14(14):1491. doi: 10.3390/diagnostics14141491.


DOI:10.3390/diagnostics14141491
PMID:39061628
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11275570/
Abstract

Medical researchers are increasingly utilizing advanced LLMs like ChatGPT-4 and Gemini to enhance diagnostic processes in the medical field. This research focuses on their ability to comprehend and apply complex medical classification systems for breast conditions, which can significantly aid plastic surgeons in making informed decisions for diagnosis and treatment, ultimately leading to improved patient outcomes. Fifty clinical scenarios were created to evaluate the classification accuracy of each LLM across five established breast-related classification systems. Scores from 0 to 2 were assigned to LLM responses to denote incorrect, partially correct, or completely correct classifications. Descriptive statistics were employed to compare the performances of ChatGPT-4 and Gemini. Gemini exhibited superior overall performance, achieving 98% accuracy compared to ChatGPT-4's 71%. While both models performed well in the Baker classification for capsular contracture and UTSW classification for gynecomastia, Gemini consistently outperformed ChatGPT-4 in other systems, such as the Fischer Grade Classification for gender-affirming mastectomy, Kajava Classification for ectopic breast tissue, and Regnault Classification for breast ptosis. With further development, integrating LLMs into plastic surgery practice will likely enhance diagnostic support and decision making.

摘要

医学研究人员越来越多地利用ChatGPT-4和Gemini等先进的大语言模型来改进医学领域的诊断流程。这项研究聚焦于它们理解和应用复杂乳腺疾病分类系统的能力,这能够极大地帮助整形外科医生做出明智的诊断和治疗决策,最终改善患者的治疗效果。创建了50个临床病例场景,以评估每个大语言模型在五个既定乳腺相关分类系统中的分类准确性。大语言模型的回答被赋予0到2分,以表示分类错误、部分正确或完全正确。采用描述性统计来比较ChatGPT-4和Gemini的性能。Gemini展现出更优的整体性能,准确率达到98%,而ChatGPT-4的准确率为71%。虽然两个模型在包膜挛缩的贝克分类和男性乳房发育的UTSW分类中表现良好,但在其他系统中,如性别确认乳房切除术的菲舍尔分级分类、异位乳腺组织的卡亚瓦分类和乳房下垂的雷诺分类中,Gemini始终优于ChatGPT-4。随着进一步发展,将大语言模型整合到整形外科实践中可能会增强诊断支持和决策制定。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b750/11275570/bc443b21358d/diagnostics-14-01491-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b750/11275570/1de0eb3bc026/diagnostics-14-01491-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b750/11275570/bc443b21358d/diagnostics-14-01491-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b750/11275570/1de0eb3bc026/diagnostics-14-01491-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b750/11275570/bc443b21358d/diagnostics-14-01491-g002.jpg

相似文献

[1]
Evaluating Large Language Model (LLM) Performance on Established Breast Classification Systems.

Diagnostics (Basel). 2024-7-11

[2]
Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini.

Medicina (Kaunas). 2024-6-8

[3]
Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration.

JMIR Med Inform. 2024-4-9

[4]
AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries.

J Clin Med. 2024-5-11

[5]
Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.

J Med Internet Res. 2024-6-14

[6]
The Role of Large Language Models (LLMs) in Providing Triage for Maxillofacial Trauma Cases: A Preliminary Study.

Diagnostics (Basel). 2024-4-18

[7]
Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy.

Cureus. 2024-5-9

[8]
Artificial Intelligence Large Language Models Address Anterior Cruciate Ligament Reconstruction: Superior Clarity and Completeness by Gemini Compared With ChatGPT-4 in Response to American Academy of Orthopaedic Surgeons Clinical Practice Guidelines.

Arthroscopy. 2025-6

[9]
Can large language models provide accurate and quality information to parents regarding chronic kidney diseases?

J Eval Clin Pract. 2024-12

[10]
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.

JMIR Med Educ. 2024-2-13

引用本文的文献

[1]
Clinical and economic impact of a large language model in perioperative medicine: a randomized crossover trial.

NPJ Digit Med. 2025-7-21

[2]
Large language models for disease diagnosis: a scoping review.

NPJ Artif Intell. 2025

[3]
Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.

J Med Internet Res. 2025-4-30

[4]
Evaluation of Information About Cardiovascular Implications of Gender-Affirming Care From Online Chat-based Artificial Intelligence Systems.

CJC Open. 2024-11-30

[5]
Assessing online chat-based artificial intelligence models for weight loss recommendation appropriateness and bias in the presence of guideline incongruence.

Int J Obes (Lond). 2025-1-27

[6]
Using Large Language Models to Retrieve Critical Data from Clinical Processes and Business Rules.

Bioengineering (Basel). 2024-12-28

[7]
Probabilistic medical predictions of large language models.

NPJ Digit Med. 2024-12-19

本文引用的文献

[1]
A narrative review of telemedicine and its adoption across specialties.

Mhealth. 2024-4-15

[2]
Enabling Personalized Medicine in Orthopaedic Surgery Through Artificial Intelligence: A Critical Analysis Review.

JBJS Rev. 2024-3-1

[3]
Large Language Models in Medicine: The Potentials and Pitfalls : A Narrative Review.

Ann Intern Med. 2024-2

[4]
The ability of artificial intelligence tools to formulate orthopaedic clinical decisions in comparison to human clinicians: An analysis of ChatGPT 3.5, ChatGPT 4, and Bard.

J Orthop. 2023-12-1

[5]
The Impact of Multimodal Large Language Models on Health Care's Future.

J Med Internet Res. 2023-11-2

[6]
Using Generative Artificial Intelligence Tools in Cosmetic Surgery: A Study on Rhinoplasty, Facelifts, and Blepharoplasty Procedures.

J Clin Med. 2023-10-14

[7]
The future landscape of large language models in medicine.

Commun Med (Lond). 2023-10-10

[8]
A New Clinical Classification for Gynecomastia Management and Predictive Outcome.

Indian J Plast Surg. 2023-7-28

[9]
Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment.

Can Assoc Radiol J. 2024-5

[10]
The Role of Large Language Models in Medical Education: Applications and Implications.

JMIR Med Educ. 2023-8-14

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索