用于计算局部麻醉药最大安全剂量的3种对话式生成人工智能模型的性能：比较分析

Performance of 3 Conversational Generative Artificial Intelligence Models for Computing Maximum Safe Doses of Local Anesthetics: Comparative Analysis.

作者信息

Suppan Mélanie, Fubini Pietro Elias, Stefani Alexandra, Gisselbaek Mia, Samer Caroline Flora, Savoldelli Georges Louis

机构信息

Division of Anaesthesiology, Department of Acute Care Medicine, Geneva University Hospitals, Rue Gabrielle-Perret-Gentil 4, Geneva, 1211, Switzerland.

Department of Anaesthesiology, Pharmacology, Intensive Care and Emergency Medicine, Faculty of Medicine, University of Geneva, Rue Michel-Servet 1, Geneva, 1211, Switzerland.

出版信息

JMIR AI. 2025 May 13;4:e66796. doi: 10.2196/66796.

DOI:10.2196/66796

PMID:40605845

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12223683/

Abstract

BACKGROUND

Generative artificial intelligence (AI) is showing great promise as a tool to optimize decision-making across various fields, including medicine. In anesthesiology, accurately calculating maximum safe doses of local anesthetics (LAs) is crucial to prevent complications such as local anesthetic systemic toxicity (LAST). Current methods for determining LA dosage are largely based on empirical guidelines and clinician experience, which can result in significant variability and dosing errors. AI models may offer a solution, by processing multiple parameters simultaneously to suggest adequate LA doses.

OBJECTIVE

This study aimed to evaluate the efficacy and safety of 3 generative AI models, ChatGPT (OpenAI), Copilot (Microsoft Corporation), and Gemini (Google LLC), in calculating maximum safe LA doses, with the goal of determining their potential use in clinical practice.

METHODS

A comparative analysis was conducted using a 51-item questionnaire designed to assess LA dose calculation across 10 simulated clinical vignettes. The responses generated by ChatGPT, Copilot, and Gemini were compared with reference doses calculated using a scientifically validated set of rules. Quantitative evaluations involved comparing AI-generated doses to these reference doses, while qualitative assessments were conducted by independent reviewers using a 5-point Likert scale.

RESULTS

All 3 AI models (Gemini, ChatGPT, and Copilot) completed the questionnaire and generated responses aligned with LA dose calculation principles, but their performance in providing safe doses varied significantly. Gemini frequently avoided proposing any specific dose, instead recommending consultation with a specialist. When it did provide dose ranges, they often exceeded safe limits by 140% (SD 103%) in cases involving mixtures. ChatGPT provided unsafe doses in 90% (9/10) of cases, exceeding safe limits by 198% (SD 196%). Copilot's recommendations were unsafe in 67% (6/9) of cases, exceeding limits by 217% (SD 239%). Qualitative assessments rated Gemini as "fair" and both ChatGPT and Copilot as "poor."

CONCLUSIONS

Generative AI models like Gemini, ChatGPT, and Copilot currently lack the accuracy and reliability needed for safe LA dose calculation. Their poor performance suggests that they should not be used as decision-making tools for this purpose. Until more reliable AI-driven solutions are developed and validated, clinicians should rely on their expertise, experience, and a careful assessment of individual patient factors to guide LA dosing and ensure patient safety.

摘要

背景

生成式人工智能（AI）作为一种优化包括医学在内的各个领域决策的工具，展现出了巨大的潜力。在麻醉学中，准确计算局部麻醉药（LA）的最大安全剂量对于预防诸如局部麻醉药全身毒性（LAST）等并发症至关重要。目前确定LA剂量的方法很大程度上基于经验指南和临床医生的经验，这可能导致显著的变异性和剂量错误。AI模型或许能提供一种解决方案，通过同时处理多个参数来建议合适的LA剂量。

目的

本研究旨在评估3种生成式AI模型ChatGPT（OpenAI）、Copilot（微软公司）和Gemini（谷歌有限责任公司）在计算LA最大安全剂量方面的有效性和安全性，目的是确定它们在临床实践中的潜在用途。

方法

使用一份包含51个条目的问卷进行比较分析，该问卷旨在评估10个模拟临床病例中的LA剂量计算。将ChatGPT、Copilot和Gemini生成的回答与使用一套经过科学验证的规则计算出的参考剂量进行比较。定量评估涉及将AI生成的剂量与这些参考剂量进行比较，而定性评估则由独立评审员使用5点李克特量表进行。

结果

所有3种AI模型（Gemini、ChatGPT和Copilot）都完成了问卷并生成了符合LA剂量计算原则的回答，但它们在提供安全剂量方面的表现差异显著。Gemini经常避免提出任何具体剂量，而是建议咨询专家。当它确实提供剂量范围时，在涉及混合药物的情况下，其剂量范围常常超过安全限度140%（标准差103%）。ChatGPT在90%（9/10）的病例中提供了不安全剂量，超过安全限度198%（标准差196%）。Copilot的建议在67%（6/9）的病例中不安全，超过限度217%（标准差239%）。定性评估将Gemini评为“中等”，而ChatGPT和Copilot均评为“差”。

结论

像Gemini、ChatGPT和Copilot这样的生成式AI模型目前缺乏安全计算LA剂量所需的准确性和可靠性。它们的糟糕表现表明，不应将它们用作此目的的决策工具。在开发和验证出更可靠的AI驱动解决方案之前，临床医生应依靠自己的专业知识、经验以及对个体患者因素的仔细评估来指导LA给药并确保患者安全。

相似文献

Performance of 3 Conversational Generative Artificial Intelligence Models for Computing Maximum Safe Doses of Local Anesthetics: Comparative Analysis.用于计算局部麻醉药最大安全剂量的3种对话式生成人工智能模型的性能：比较分析

JMIR AI. 2025 May 13;4:e66796. doi: 10.2196/66796.

Accuracy of ChatGPT-3.5, ChatGPT-4o, Copilot, Gemini, Claude, and Perplexity in advising on lumbosacral radicular pain against clinical practice guidelines: cross-sectional study.ChatGPT-3.5、ChatGPT-4o、Copilot、Gemini、Claude和Perplexity在依据临床实践指南对腰骶神经根性疼痛提供建议方面的准确性：横断面研究

Front Digit Health. 2025 Jun 27;7:1574287. doi: 10.3389/fdgth.2025.1574287. eCollection 2025.

Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能：ChatGPT与谷歌Gemini的较量

Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.

Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.ChatGPT与互联网搜索用于职业医学临床研究和决策的比较：随机对照试验

JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.

"Dr. AI Will See You Now": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines?“AI 医生为您服务”：ChatGPT-4 的治疗建议与骨科临床实践指南如何契合？

Clin Orthop Relat Res. 2024 Dec 1;482(12):2098-2106. doi: 10.1097/CORR.0000000000003234. Epub 2024 Sep 6.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Evaluating large language models for renal colic imaging recommendations: a comparative analysis of Gemini, copilot, and ChatGPT-4.0.评估用于肾绞痛成像建议的大语言模型：Gemini、Copilot和ChatGPT-4.0的比较分析。

Int J Emerg Med. 2025 Jul 4;18(1):123. doi: 10.1186/s12245-025-00895-3.

Comparing the effectiveness of generative AI technology in commonly asked scoliosis questions.比较生成式人工智能技术在常见脊柱侧弯问题中的有效性。

J Child Orthop. 2025 Jul 26:18632521251359098. doi: 10.1177/18632521251359098.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Sexual Harassment and Prevention Training性骚扰与预防培训

本文引用的文献

Learning to Fake It: Limited Responses and Fabricated References Provided by ChatGPT for Medical Questions.学会伪装：ChatGPT对医学问题的有限回答与编造参考文献

Mayo Clin Proc Digit Health. 2023 Jun 12;1(3):226-234. doi: 10.1016/j.mcpdig.2023.05.004. eCollection 2023 Sep.

Generative artificial intelligence in primary care: an online survey of UK general practitioners.初级保健中的生成式人工智能：英国全科医生的在线调查。

BMJ Health Care Inform. 2024 Sep 17;31(1):e101102. doi: 10.1136/bmjhci-2024-101102.

Augmenting intensive care unit nursing practice with generative AI: A formative study of diagnostic synergies using simulation-based clinical cases.利用生成式人工智能增强重症监护病房护理实践：一项基于模拟临床病例的诊断协同形成性研究。

J Clin Nurs. 2024 Aug 5. doi: 10.1111/jocn.17384.

Diagnostic accuracy of GPT-4 on common clinical scenarios and challenging cases.GPT-4在常见临床场景和具有挑战性病例中的诊断准确性。

Learn Health Syst. 2024 Jun 25;8(3):e10438. doi: 10.1002/lrh2.10438. eCollection 2024 Jul.

Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.分诊表现比较：大型语言模型、ChatGPT 和未经训练的急诊医生：一项对比研究。

J Med Internet Res. 2024 Jun 14;26:e53297. doi: 10.2196/53297.

Comparative analysis of ChatGPT, Gemini and emergency medicine specialist in ESI triage assessment.ChatGPT、Gemini 与急诊专科医生在急诊病情严重程度分级评估中的比较分析。

Am J Emerg Med. 2024 Jul;81:146-150. doi: 10.1016/j.ajem.2024.05.001. Epub 2024 May 3.

Redefining Healthcare With Artificial Intelligence (AI): The Contributions of ChatGPT, Gemini, and Co-pilot.用人工智能（AI）重新定义医疗保健：ChatGPT、Gemini和Copilot的贡献。

Cureus. 2024 Apr 7;16(4):e57795. doi: 10.7759/cureus.57795. eCollection 2024 Apr.

Poor performance of ChatGPT in clinical rule-guided dose interventions in hospitalized patients with renal dysfunction.ChatGPT 在肾功能障碍住院患者的临床规则指导剂量干预中的表现不佳。

Eur J Clin Pharmacol. 2024 Aug;80(8):1133-1140. doi: 10.1007/s00228-024-03687-5. Epub 2024 Apr 9.

Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison.大语言模型作为青光眼手术病例的辅助工具：ChatGPT 与 Google Gemini 的对比。

Graefes Arch Clin Exp Ophthalmol. 2024 Sep;262(9):2945-2959. doi: 10.1007/s00417-024-06470-5. Epub 2024 Apr 4.

Artificial Intelligence for Anesthesiology Board-Style Examination Questions: Role of Large Language Models.人工智能在麻醉学 board 式考试问题中的应用：大语言模型的作用。

J Cardiothorac Vasc Anesth. 2024 May;38(5):1251-1259. doi: 10.1053/j.jvca.2024.01.032. Epub 2024 Feb 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验