Suppr超能文献

用于计算局部麻醉药最大安全剂量的3种对话式生成人工智能模型的性能:比较分析

Performance of 3 Conversational Generative Artificial Intelligence Models for Computing Maximum Safe Doses of Local Anesthetics: Comparative Analysis.

作者信息

Suppan Mélanie, Fubini Pietro Elias, Stefani Alexandra, Gisselbaek Mia, Samer Caroline Flora, Savoldelli Georges Louis

机构信息

Division of Anaesthesiology, Department of Acute Care Medicine, Geneva University Hospitals, Rue Gabrielle-Perret-Gentil 4, Geneva, 1211, Switzerland.

Department of Anaesthesiology, Pharmacology, Intensive Care and Emergency Medicine, Faculty of Medicine, University of Geneva, Rue Michel-Servet 1, Geneva, 1211, Switzerland.

出版信息

JMIR AI. 2025 May 13;4:e66796. doi: 10.2196/66796.

Abstract

BACKGROUND

Generative artificial intelligence (AI) is showing great promise as a tool to optimize decision-making across various fields, including medicine. In anesthesiology, accurately calculating maximum safe doses of local anesthetics (LAs) is crucial to prevent complications such as local anesthetic systemic toxicity (LAST). Current methods for determining LA dosage are largely based on empirical guidelines and clinician experience, which can result in significant variability and dosing errors. AI models may offer a solution, by processing multiple parameters simultaneously to suggest adequate LA doses.

OBJECTIVE

This study aimed to evaluate the efficacy and safety of 3 generative AI models, ChatGPT (OpenAI), Copilot (Microsoft Corporation), and Gemini (Google LLC), in calculating maximum safe LA doses, with the goal of determining their potential use in clinical practice.

METHODS

A comparative analysis was conducted using a 51-item questionnaire designed to assess LA dose calculation across 10 simulated clinical vignettes. The responses generated by ChatGPT, Copilot, and Gemini were compared with reference doses calculated using a scientifically validated set of rules. Quantitative evaluations involved comparing AI-generated doses to these reference doses, while qualitative assessments were conducted by independent reviewers using a 5-point Likert scale.

RESULTS

All 3 AI models (Gemini, ChatGPT, and Copilot) completed the questionnaire and generated responses aligned with LA dose calculation principles, but their performance in providing safe doses varied significantly. Gemini frequently avoided proposing any specific dose, instead recommending consultation with a specialist. When it did provide dose ranges, they often exceeded safe limits by 140% (SD 103%) in cases involving mixtures. ChatGPT provided unsafe doses in 90% (9/10) of cases, exceeding safe limits by 198% (SD 196%). Copilot's recommendations were unsafe in 67% (6/9) of cases, exceeding limits by 217% (SD 239%). Qualitative assessments rated Gemini as "fair" and both ChatGPT and Copilot as "poor."

CONCLUSIONS

Generative AI models like Gemini, ChatGPT, and Copilot currently lack the accuracy and reliability needed for safe LA dose calculation. Their poor performance suggests that they should not be used as decision-making tools for this purpose. Until more reliable AI-driven solutions are developed and validated, clinicians should rely on their expertise, experience, and a careful assessment of individual patient factors to guide LA dosing and ensure patient safety.

摘要

背景

生成式人工智能(AI)作为一种优化包括医学在内的各个领域决策的工具,展现出了巨大的潜力。在麻醉学中,准确计算局部麻醉药(LA)的最大安全剂量对于预防诸如局部麻醉药全身毒性(LAST)等并发症至关重要。目前确定LA剂量的方法很大程度上基于经验指南和临床医生的经验,这可能导致显著的变异性和剂量错误。AI模型或许能提供一种解决方案,通过同时处理多个参数来建议合适的LA剂量。

目的

本研究旨在评估3种生成式AI模型ChatGPT(OpenAI)、Copilot(微软公司)和Gemini(谷歌有限责任公司)在计算LA最大安全剂量方面的有效性和安全性,目的是确定它们在临床实践中的潜在用途。

方法

使用一份包含51个条目的问卷进行比较分析,该问卷旨在评估10个模拟临床病例中的LA剂量计算。将ChatGPT、Copilot和Gemini生成的回答与使用一套经过科学验证的规则计算出的参考剂量进行比较。定量评估涉及将AI生成的剂量与这些参考剂量进行比较,而定性评估则由独立评审员使用5点李克特量表进行。

结果

所有3种AI模型(Gemini、ChatGPT和Copilot)都完成了问卷并生成了符合LA剂量计算原则的回答,但它们在提供安全剂量方面的表现差异显著。Gemini经常避免提出任何具体剂量,而是建议咨询专家。当它确实提供剂量范围时,在涉及混合药物的情况下,其剂量范围常常超过安全限度140%(标准差103%)。ChatGPT在90%(9/10)的病例中提供了不安全剂量,超过安全限度198%(标准差196%)。Copilot的建议在67%(6/9)的病例中不安全,超过限度217%(标准差239%)。定性评估将Gemini评为“中等”,而ChatGPT和Copilot均评为“差”。

结论

像Gemini、ChatGPT和Copilot这样的生成式AI模型目前缺乏安全计算LA剂量所需的准确性和可靠性。它们的糟糕表现表明,不应将它们用作此目的的决策工具。在开发和验证出更可靠的AI驱动解决方案之前,临床医生应依靠自己的专业知识、经验以及对个体患者因素的仔细评估来指导LA给药并确保患者安全。

相似文献

3
Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.
Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.
5
"Dr. AI Will See You Now": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines?
Clin Orthop Relat Res. 2024 Dec 1;482(12):2098-2106. doi: 10.1097/CORR.0000000000003234. Epub 2024 Sep 6.
6
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
8
Comparing the effectiveness of generative AI technology in commonly asked scoliosis questions.
J Child Orthop. 2025 Jul 26:18632521251359098. doi: 10.1177/18632521251359098.

本文引用的文献

1
Learning to Fake It: Limited Responses and Fabricated References Provided by ChatGPT for Medical Questions.
Mayo Clin Proc Digit Health. 2023 Jun 12;1(3):226-234. doi: 10.1016/j.mcpdig.2023.05.004. eCollection 2023 Sep.
2
Generative artificial intelligence in primary care: an online survey of UK general practitioners.
BMJ Health Care Inform. 2024 Sep 17;31(1):e101102. doi: 10.1136/bmjhci-2024-101102.
4
Diagnostic accuracy of GPT-4 on common clinical scenarios and challenging cases.
Learn Health Syst. 2024 Jun 25;8(3):e10438. doi: 10.1002/lrh2.10438. eCollection 2024 Jul.
6
Comparative analysis of ChatGPT, Gemini and emergency medicine specialist in ESI triage assessment.
Am J Emerg Med. 2024 Jul;81:146-150. doi: 10.1016/j.ajem.2024.05.001. Epub 2024 May 3.
7
Redefining Healthcare With Artificial Intelligence (AI): The Contributions of ChatGPT, Gemini, and Co-pilot.
Cureus. 2024 Apr 7;16(4):e57795. doi: 10.7759/cureus.57795. eCollection 2024 Apr.
8
Poor performance of ChatGPT in clinical rule-guided dose interventions in hospitalized patients with renal dysfunction.
Eur J Clin Pharmacol. 2024 Aug;80(8):1133-1140. doi: 10.1007/s00228-024-03687-5. Epub 2024 Apr 9.
9
Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison.
Graefes Arch Clin Exp Ophthalmol. 2024 Sep;262(9):2945-2959. doi: 10.1007/s00417-024-06470-5. Epub 2024 Apr 4.
10
Artificial Intelligence for Anesthesiology Board-Style Examination Questions: Role of Large Language Models.
J Cardiothorac Vasc Anesth. 2024 May;38(5):1251-1259. doi: 10.1053/j.jvca.2024.01.032. Epub 2024 Feb 1.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验