Levin Chedva, Orkaby Brurya, Kerner Erika, Saban Mor
Faculty of School of Life and Health Sciences, Nursing Department, The Jerusalem College of Technology-Lev Academic Center, Jerusalem, Israel.
The Department of Vascular Surgery, The Chaim Sheba Medical Center, Tel Hashomer, Ramat Gan, Tel Aviv, Israel.
Pediatr Res. 2025 Mar 8. doi: 10.1038/s41390-025-03980-8.
Medication errors in pediatric care remain a significant healthcare challenge despite technological advancements, necessitating innovative approaches. This study aims to evaluate Large Language Models' (LLMs) potential in reducing pediatric medication dosage calculation errors compared to experienced nurses.
This cross-sectional study (June-August 2024) involved 101 nurses from pediatric and neonatal departments and three LLMs (ChatGPT-4o, Claude-3.0, Llama 3 8B). Participants completed a nine-question survey on pediatric medication calculations. Primary outcomes were accuracy and response time. Secondary measures included seniority and group membership on accuracy.
Significant differences (P < 0.001) were observed between nurses and LLMs. Nurses averaged 93.14 ± 9.39 accuracy. Claude-3.0 and ChatGPT-4o achieved 100 accuracy, while Llama 3 8B was 66 accurate. LLMs were faster (15.7-75.12 seconds) than nurses (1621.2 ± 8379.3 s). The Generalized Linear Model analysis revealed task performance was significantly influenced by duration (Wald χ² = 27,881.261, p < 0.001) and interaction between relative seniority and group membership (Wald χ² = 3,938.250, p < 0.001), with participants achieving a mean total grade of 91.03 (SD = 13.87).
Claude-3.0 and ChatGPT-4o demonstrated perfect accuracy and rapid calculation capabilities, showing promise in reducing pediatric medication dosage errors. Further research is needed to explore their integration into practice.
Key Message Large Language Models (LLMs) like ChatGPT-4o and Claude-3.0 demonstrate perfect accuracy and significantly faster response times in pediatric medication dosage calculations, showing potential to reduce errors and save time. Addition to Existing Literature This study provides novel insights by quantitatively comparing LLM performance with experienced nurses, contributing to the understanding of AI's role in improving medication safety. Impact The findings emphasize the value of LLMs as supplemental tools in healthcare, particularly in high-stakes pediatric care, where they can reduce calculation errors and improve clinical efficiency.
尽管技术不断进步,但儿科护理中的用药错误仍是一个重大的医疗挑战,因此需要创新方法。本研究旨在评估大语言模型(LLMs)与经验丰富的护士相比,在减少儿科用药剂量计算错误方面的潜力。
这项横断面研究(2024年6月至8月)涉及来自儿科和新生儿科的101名护士以及三个大语言模型(ChatGPT-4o、Claude-3.0、Llama 3 8B)。参与者完成了一项关于儿科用药计算的九个问题的调查。主要结果是准确性和响应时间。次要指标包括资历和准确性方面的组成员身份。
护士与大语言模型之间存在显著差异(P < 0.001)。护士的平均准确率为93.14 ± 9.39。Claude-3.0和ChatGPT-4o的准确率达到100%,而Llama 3 8B的准确率为66%。大语言模型比护士更快(15.7 - 75.12秒)(护士为1621.2 ± 8379.3秒)。广义线性模型分析显示,任务表现受到持续时间(Wald χ² = 27,881.261,p < 0.001)以及相对资历和组成员身份之间的交互作用(Wald χ² = 3,938.250,p < 0.001)的显著影响,参与者的平均总分为91.03(标准差 = 13.87)。
Claude-3.0和ChatGPT-4o展示了完美的准确性和快速计算能力,在减少儿科用药剂量错误方面显示出前景。需要进一步研究以探索将它们整合到实践中的方法。
关键信息 ChatGPT-4o和Claude-3.0等大语言模型在儿科用药剂量计算中展示了完美的准确性和显著更快的响应时间,显示出减少错误和节省时间的潜力。对现有文献的补充 本研究通过将大语言模型的性能与经验丰富的护士进行定量比较,提供了新的见解,有助于理解人工智能在提高用药安全性方面的作用。影响 研究结果强调了大语言模型作为医疗保健补充工具(特别是在高风险的儿科护理中,它们可以减少计算错误并提高临床效率) 的价值。