Suppan Mélanie, Fubini Pietro Elias, Stefani Alexandra, Gisselbaek Mia, Samer Caroline Flora, Savoldelli Georges Louis
Division of Anaesthesiology, Department of Acute Care Medicine, Geneva University Hospitals, Rue Gabrielle-Perret-Gentil 4, Geneva, 1211, Switzerland.
Department of Anaesthesiology, Pharmacology, Intensive Care and Emergency Medicine, Faculty of Medicine, University of Geneva, Rue Michel-Servet 1, Geneva, 1211, Switzerland.
JMIR AI. 2025 May 13;4:e66796. doi: 10.2196/66796.
Generative artificial intelligence (AI) is showing great promise as a tool to optimize decision-making across various fields, including medicine. In anesthesiology, accurately calculating maximum safe doses of local anesthetics (LAs) is crucial to prevent complications such as local anesthetic systemic toxicity (LAST). Current methods for determining LA dosage are largely based on empirical guidelines and clinician experience, which can result in significant variability and dosing errors. AI models may offer a solution, by processing multiple parameters simultaneously to suggest adequate LA doses.
This study aimed to evaluate the efficacy and safety of 3 generative AI models, ChatGPT (OpenAI), Copilot (Microsoft Corporation), and Gemini (Google LLC), in calculating maximum safe LA doses, with the goal of determining their potential use in clinical practice.
A comparative analysis was conducted using a 51-item questionnaire designed to assess LA dose calculation across 10 simulated clinical vignettes. The responses generated by ChatGPT, Copilot, and Gemini were compared with reference doses calculated using a scientifically validated set of rules. Quantitative evaluations involved comparing AI-generated doses to these reference doses, while qualitative assessments were conducted by independent reviewers using a 5-point Likert scale.
All 3 AI models (Gemini, ChatGPT, and Copilot) completed the questionnaire and generated responses aligned with LA dose calculation principles, but their performance in providing safe doses varied significantly. Gemini frequently avoided proposing any specific dose, instead recommending consultation with a specialist. When it did provide dose ranges, they often exceeded safe limits by 140% (SD 103%) in cases involving mixtures. ChatGPT provided unsafe doses in 90% (9/10) of cases, exceeding safe limits by 198% (SD 196%). Copilot's recommendations were unsafe in 67% (6/9) of cases, exceeding limits by 217% (SD 239%). Qualitative assessments rated Gemini as "fair" and both ChatGPT and Copilot as "poor."
Generative AI models like Gemini, ChatGPT, and Copilot currently lack the accuracy and reliability needed for safe LA dose calculation. Their poor performance suggests that they should not be used as decision-making tools for this purpose. Until more reliable AI-driven solutions are developed and validated, clinicians should rely on their expertise, experience, and a careful assessment of individual patient factors to guide LA dosing and ensure patient safety.
生成式人工智能(AI)作为一种优化包括医学在内的各个领域决策的工具,展现出了巨大的潜力。在麻醉学中,准确计算局部麻醉药(LA)的最大安全剂量对于预防诸如局部麻醉药全身毒性(LAST)等并发症至关重要。目前确定LA剂量的方法很大程度上基于经验指南和临床医生的经验,这可能导致显著的变异性和剂量错误。AI模型或许能提供一种解决方案,通过同时处理多个参数来建议合适的LA剂量。
本研究旨在评估3种生成式AI模型ChatGPT(OpenAI)、Copilot(微软公司)和Gemini(谷歌有限责任公司)在计算LA最大安全剂量方面的有效性和安全性,目的是确定它们在临床实践中的潜在用途。
使用一份包含51个条目的问卷进行比较分析,该问卷旨在评估10个模拟临床病例中的LA剂量计算。将ChatGPT、Copilot和Gemini生成的回答与使用一套经过科学验证的规则计算出的参考剂量进行比较。定量评估涉及将AI生成的剂量与这些参考剂量进行比较,而定性评估则由独立评审员使用5点李克特量表进行。
所有3种AI模型(Gemini、ChatGPT和Copilot)都完成了问卷并生成了符合LA剂量计算原则的回答,但它们在提供安全剂量方面的表现差异显著。Gemini经常避免提出任何具体剂量,而是建议咨询专家。当它确实提供剂量范围时,在涉及混合药物的情况下,其剂量范围常常超过安全限度140%(标准差103%)。ChatGPT在90%(9/10)的病例中提供了不安全剂量,超过安全限度198%(标准差196%)。Copilot的建议在67%(6/9)的病例中不安全,超过限度217%(标准差239%)。定性评估将Gemini评为“中等”,而ChatGPT和Copilot均评为“差”。
像Gemini、ChatGPT和Copilot这样的生成式AI模型目前缺乏安全计算LA剂量所需的准确性和可靠性。它们的糟糕表现表明,不应将它们用作此目的的决策工具。在开发和验证出更可靠的AI驱动解决方案之前,临床医生应依靠自己的专业知识、经验以及对个体患者因素的仔细评估来指导LA给药并确保患者安全。