• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在临床不确定性条件下,大型语言模型GPT-4与内分泌学家关于降糖药物初始选择的反应比较。

Large Language Model GPT-4 Compared to Endocrinologist Responses on Initial Choice of Glucose-Lowering Medication Under Conditions of Clinical Uncertainty.

作者信息

Flory James H, Ancker Jessica S, Kim Scott Y H, Kuperman Gilad, Petrov Aleksandr, Vickers Andrew

机构信息

Memorial Sloan Kettering Cancer Center, New York, NY.

Vanderbilt University Medical Center, Nashville, TN.

出版信息

Diabetes Care. 2025 Feb 1;48(2):185-192. doi: 10.2337/dc24-1067.

DOI:10.2337/dc24-1067
PMID:39250109
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11770168/
Abstract

OBJECTIVE

To explore how the commercially available large language model (LLM) GPT-4 compares to endocrinologists when addressing medical questions when there is uncertainty regarding the best answer.

RESEARCH DESIGN AND METHODS

This study compared responses from GPT-4 to responses from 31 endocrinologists using hypothetical clinical vignettes focused on diabetes, specifically examining the prescription of metformin versus alternative treatments. The primary outcome was the choice between metformin and other treatments.

RESULTS

With a simple prompt, GPT-4 chose metformin in 12% (95% CI 7.9-17%) of responses, compared with 31% (95% CI 23-39%) of endocrinologist responses. After modifying the prompt to encourage metformin use, the selection of metformin by GPT-4 increased to 25% (95% CI 22-28%). GPT-4 rarely selected metformin in patients with impaired kidney function, or a history of gastrointestinal distress (2.9% of responses, 95% CI 1.4-5.5%). In contrast, endocrinologists often prescribed metformin even in patients with a history of gastrointestinal distress (21% of responses, 95% CI 12-36%). GPT-4 responses showed low variability on repeated runs except at intermediate levels of kidney function.

CONCLUSIONS

In clinical scenarios with no single right answer, GPT-4's responses were reasonable, but differed from endocrinologists' responses in clinically important ways. Value judgments are needed to determine when these differences should be addressed by adjusting the model. We recommend against reliance on LLM output until it is shown to align not just with clinical guidelines but also with patient and clinician preferences, or it demonstrates improvement in clinical outcomes over standard of care.

摘要

目的

探讨在最佳答案存在不确定性时,商用大语言模型(LLM)GPT-4在解答医学问题时与内分泌科医生相比表现如何。

研究设计与方法

本研究使用聚焦于糖尿病的假设临床病例,比较了GPT-4的回答与31位内分泌科医生的回答,具体考察二甲双胍与替代治疗的处方情况。主要结局是二甲双胍与其他治疗方法之间的选择。

结果

在简单提示下,GPT-4在12%(95%置信区间7.9 - 17%)的回答中选择了二甲双胍,而内分泌科医生的这一比例为31%(95%置信区间23 - 39%)。在修改提示以鼓励使用二甲双胍后,GPT-4选择二甲双胍的比例增至25%(95%置信区间22 - 28%)。GPT-4很少为肾功能受损或有胃肠道不适病史的患者选择二甲双胍(回答的2.9%,95%置信区间1.4 - 5.5%)。相比之下,内分泌科医生即使在有胃肠道不适病史的患者中也经常开具二甲双胍(回答的21%,95%置信区间12 - 36%)。除了在肾功能处于中等水平时,GPT-4的回答在重复运行时显示出较低的变异性。

结论

在没有单一正确答案的临床场景中,GPT-4的回答是合理的,但在临床上重要的方面与内分泌科医生的回答不同。需要进行价值判断来确定何时应通过调整模型来解决这些差异。我们建议在LLM输出不仅符合临床指南,还符合患者和临床医生的偏好,或者其在临床结局方面优于标准治疗之前,不要依赖其输出结果。

相似文献

1
Large Language Model GPT-4 Compared to Endocrinologist Responses on Initial Choice of Glucose-Lowering Medication Under Conditions of Clinical Uncertainty.在临床不确定性条件下,大型语言模型GPT-4与内分泌学家关于降糖药物初始选择的反应比较。
Diabetes Care. 2025 Feb 1;48(2):185-192. doi: 10.2337/dc24-1067.
2
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状荟萃分析。
Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.
3
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.
4
Oral anti-diabetic agents for women with established diabetes/impaired glucose tolerance or previous gestational diabetes planning pregnancy, or pregnant women with pre-existing diabetes.用于患有已确诊糖尿病/糖耐量受损或既往妊娠糖尿病且计划怀孕的女性,或患有孕前糖尿病的孕妇的口服抗糖尿病药物。
Cochrane Database Syst Rev. 2017 Oct 18;10(10):CD007724. doi: 10.1002/14651858.CD007724.pub3.
5
Insulin-sensitising drugs (metformin, rosiglitazone, pioglitazone, D-chiro-inositol) for women with polycystic ovary syndrome, oligo amenorrhoea and subfertility.用于患有多囊卵巢综合征、月经过少和生育力低下的女性的胰岛素增敏药物(二甲双胍、罗格列酮、吡格列酮、D-手性肌醇)。
Cochrane Database Syst Rev. 2017 Nov 29;11(11):CD003053. doi: 10.1002/14651858.CD003053.pub6.
6
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状Meta分析。
Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.
7
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
8
Metformin for women who are overweight or obese during pregnancy for improving maternal and infant outcomes.孕期超重或肥胖女性使用二甲双胍以改善母婴结局。
Cochrane Database Syst Rev. 2018 Jul 24;7(7):CD010564. doi: 10.1002/14651858.CD010564.pub2.
9
Surveillance of Barrett's oesophagus: exploring the uncertainty through systematic review, expert workshop and economic modelling.巴雷特食管的监测:通过系统评价、专家研讨会和经济模型探索不确定性
Health Technol Assess. 2006 Mar;10(8):1-142, iii-iv. doi: 10.3310/hta10080.
10
The potential of Generative Pre-trained Transformer 4 (GPT-4) to analyse medical notes in three different languages: a retrospective model-evaluation study.生成式预训练变换器4(GPT-4)分析三种不同语言医学笔记的潜力:一项回顾性模型评估研究。
Lancet Digit Health. 2025 Jan;7(1):e35-e43. doi: 10.1016/S2589-7500(24)00246-2.

引用本文的文献

1
Artificial intelligence in glycemic management for diabetes: Applications, opportunities and challenges.人工智能在糖尿病血糖管理中的应用、机遇与挑战。
J Transl Int Med. 2025 Aug 12;13(4):314-317. doi: 10.1515/jtim-2025-0039. eCollection 2025 Aug.
2
Evaluating the Performance and Safety of Large Language Models in Generating Type 2 Diabetes Mellitus Management Plans: A Comparative Study With Physicians Using Real Patient Records.评估大语言模型生成2型糖尿病管理计划的性能和安全性:一项使用真实患者记录与医生进行的对比研究。
Cureus. 2025 Mar 17;17(3):e80737. doi: 10.7759/cureus.80737. eCollection 2025 Mar.
3
Large Language Models in Diabetes Management: The Need for Human and Artificial Intelligence Collaboration.

本文引用的文献

1
A Survey of Clinicians' Views of the Utility of Large Language Models.临床医生对大型语言模型实用性的看法调查。
Appl Clin Inform. 2024 Mar;15(2):306-312. doi: 10.1055/a-2281-7092. Epub 2024 Mar 5.
2
Chain of Thought Utilization in Large Language Models and Application in Nephrology.大语言模型中的思维链利用及其在肾脏病学中的应用。
Medicina (Kaunas). 2024 Jan 13;60(1):148. doi: 10.3390/medicina60010148.
3
9. Pharmacologic Approaches to Glycemic Treatment: Standards of Care in Diabetes-2024.9. 血糖治疗的药物学方法:2024 年糖尿病护理标准。
糖尿病管理中的大语言模型:人机协作的必要性。
Diabetes Care. 2025 Feb 1;48(2):182-184. doi: 10.2337/dci24-0079.
Diabetes Care. 2024 Jan 1;47(Suppl 1):S158-S178. doi: 10.2337/dc24-S009.
4
Large Language Models for Therapy Recommendations Across 3 Clinical Specialties: Comparative Study.大型语言模型在 3 个临床专业领域的治疗推荐中的应用:比较研究。
J Med Internet Res. 2023 Oct 30;25:e49324. doi: 10.2196/49324.
5
Accuracy and Reliability of Chatbot Responses to Physician Questions.聊天机器人对医生提问回答的准确性和可靠性。
JAMA Netw Open. 2023 Oct 2;6(10):e2336483. doi: 10.1001/jamanetworkopen.2023.36483.
6
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
7
American Association of Clinical Endocrinology Consensus Statement: Comprehensive Type 2 Diabetes Management Algorithm - 2023 Update.美国临床内分泌学会共识声明:全面 2 型糖尿病管理算法-2023 年更新。
Endocr Pract. 2023 May;29(5):305-340. doi: 10.1016/j.eprac.2023.02.001.
8
Appropriateness of Breast Cancer Prevention and Screening Recommendations Provided by ChatGPT.ChatGPT提供的乳腺癌预防和筛查建议的适宜性。
Radiology. 2023 May;307(4):e230424. doi: 10.1148/radiol.230424. Epub 2023 Apr 4.
9
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
10
Prescriber Uncertainty as Opportunity to Improve Care of Type 2 Diabetes with Chronic Kidney Disease: Mixed Methods Study.《机遇与挑战并存:利用医师不确定性改善慢性肾脏病合并 2 型糖尿病患者的治疗》混合方法研究。
J Gen Intern Med. 2023 May;38(6):1476-1483. doi: 10.1007/s11606-022-07838-1. Epub 2022 Oct 31.