文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

使用大语言模型对黄蜂蜇伤进行临床管理:横断面评估研究

Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study.

作者信息

Pan Wei, Zhang Shuman, Wang Yonghong, Quan Zhenglin, Zhu Yanxia, Fang Zhicheng, Yang Xianyi

机构信息

Department of Emergency Medicine, Taihe Hospital, Hubei University of Medicine, Shiyan, Hubei, China.

The Intensive Care Unit, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan, Guangdong, China.

出版信息

J Med Internet Res. 2025 Jun 4;27:e67489. doi: 10.2196/67489.


DOI:10.2196/67489
PMID:40466102
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12177424/
Abstract

BACKGROUND: Wasp stings are a significant public health concern in many parts of the world, particularly in tropical and subtropical regions. The venom of wasps contains a variety of bioactive compounds that can lead to a wide range of clinical effects, from mild localized pain and swelling to severe, life-threatening allergic reactions, such as anaphylaxis. With the rapid development of artificial intelligence (AI) technologies, large language models (LLMs) are increasingly being used in health care, including emergency medicine and toxicology. These models have the potential to assist health care professionals in making fast and informed clinical decisions. This study aimed to assess the performance of 4 leading LLMs-ERNIE Bot 3.5 (Baidu), ERNIE Bot 4.0 (Baidu), Claude Pro (Anthropic), and ChatGPT 4.0-in managing wasp sting cases, with a focus on their accuracy, comprehensiveness, and decision-making abilities. OBJECTIVE: The objective of this research was to systematically evaluate and compare the capabilities of the 4 LLMs in the context of wasp sting management. This involved analyzing their responses to a series of standardized questions and real-world clinical scenarios. The study aimed to determine which LLMs provided the most accurate, complete, and clinically relevant information for the management of wasp stings. METHODS: This study used a cross-sectional design, creating 50 standardized questions that covered 10 key domains in the management of wasp stings, along with 20 real-world clinical case scenarios. Responses from the 4 LLMs were independently evaluated by 8 domain experts, who rated them on a 5-point Likert scale based on accuracy, completeness, and usefulness in clinical decision-making. Statistical comparisons between the models were made using the Wilcoxon signed-rank test, and the consistency of expert ratings was assessed using the Kendall coefficient of concordance. RESULTS: Claude Pro achieved the highest average score of 4.7 (SD 0.603) out of 5, followed closely by ChatGPT 4.0 with a score of 4.5. ERNIE Bot 4.0 and ERNIE Bot 3.5 received average scores of 4 (SD 0.600) and 3.8, respectively. In analyzing the 20 complex clinical cases, Claude Pro significantly outperformed ERNIE Bot 3.5, particularly in areas such as managing complications and assessing the severity of reactions (P<.001). The expert ratings showed moderate agreement (Kendall W=0.67), indicating that the assessments were consistently reliable. CONCLUSIONS: The results of this study suggest that Claude Pro and ChatGPT 4.0 are highly capable of providing accurate and comprehensive support for the clinical management of wasp stings, particularly in complex decision-making scenarios. These findings support the increasing role of AI in emergency and toxicological medicine and suggest that the choice of AI tool should be based on the specific needs of the clinical situation, ensuring that the most appropriate model is selected for different health care applications.

摘要

背景:黄蜂蜇伤是世界上许多地区,特别是热带和亚热带地区的一个重大公共卫生问题。黄蜂毒液含有多种生物活性化合物,可导致广泛的临床效应,从轻微的局部疼痛和肿胀到严重的、危及生命的过敏反应,如过敏症。随着人工智能(AI)技术的迅速发展,大语言模型(LLMs)越来越多地应用于医疗保健领域,包括急诊医学和毒理学。这些模型有潜力协助医疗保健专业人员做出快速且明智的临床决策。本研究旨在评估4种领先的大语言模型——文心一言3.5(百度)、文心一言4.0(百度)、Claude Pro(Anthropic)和ChatGPT 4.0——在处理黄蜂蜇伤病例方面的表现,重点关注其准确性、全面性和决策能力。 目的:本研究的目的是系统评估和比较这4种大语言模型在黄蜂蜇伤处理方面的能力。这包括分析它们对一系列标准化问题和实际临床场景的回答。该研究旨在确定哪种大语言模型为黄蜂蜇伤的处理提供最准确、完整且与临床相关的信息。 方法:本研究采用横断面设计,创建了50个标准化问题,涵盖黄蜂蜇伤处理的10个关键领域,以及20个实际临床病例场景。4种大语言模型的回答由8位领域专家独立评估,专家根据准确性、完整性和在临床决策中的有用性,采用5点李克特量表对其进行评分。使用Wilcoxon符号秩检验对模型之间进行统计比较,并使用肯德尔和谐系数评估专家评分的一致性。 结果:Claude Pro在5分制中获得了最高平均分4.7(标准差0.603),紧随其后的是ChatGPT 4.0,得分为4.5。文心一言4.0和文心一言^{3.5}的平均分分别为4(标准差0.600)和3.8。在分析20个复杂临床病例时,Claude Pro显著优于文心一言3.5,特别是在处理并发症和评估反应严重程度等方面(P<0.001)。专家评分显示出中度一致性(肯德尔W=0.67),表明评估一直是可靠的。 结论:本研究结果表明,Claude Pro和ChatGPT 4.0在黄蜂蜇伤的临床处理方面,特别是在复杂决策场景中,能够提供准确和全面的支持。这些发现支持了人工智能在急诊和毒理学医学中日益重要的作用,并表明人工智能工具的选择应基于临床情况的具体需求,确保为不同的医疗保健应用选择最合适的模型。

相似文献

[1]
Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study.

J Med Internet Res. 2025-6-4

[2]
Enhancing the Readability of Online Patient Education Materials Using Large Language Models: Cross-Sectional Study.

J Med Internet Res. 2025-6-4

[3]
Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.

Respir Res. 2024-12-21

[4]
Chatbot for the Return of Positive Genetic Screening Results for Hereditary Cancer Syndromes: Prompt Engineering Project.

JMIR Cancer. 2025-6-10

[5]
Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study.

JMIR Med Inform. 2025-6-11

[6]
Surveillance for Violent Deaths - National Violent Death Reporting System, 50 States, the District of Columbia, and Puerto Rico, 2022.

MMWR Surveill Summ. 2025-6-12

[7]
Satisfactory Evaluation of Call Service Using AI After Ureteral Stent Insertion: Randomized Controlled Trial.

J Med Internet Res. 2025-1-21

[8]
Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.

J Med Internet Res. 2025-6-11

[9]
Molecular feature-based classification of retroperitoneal liposarcoma: a prospective cohort study.

Elife. 2025-5-23

[10]
Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini.

JMIR Perioper Med. 2025-6-12

本文引用的文献

[1]
Large language models improve the identification of emergency department visits for symptomatic kidney stones.

Sci Rep. 2025-1-28

[2]
Medical large language models are vulnerable to data-poisoning attacks.

Nat Med. 2025-2

[3]
Enhancing Clinical Accuracy of Medical Chatbots with Large Language Models.

IEEE J Biomed Health Inform. 2024-9-27

[4]
Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook.

J Med Internet Res. 2024-9-25

[5]
Large language model application in emergency medicine and critical care.

J Formos Med Assoc. 2024-8-28

[6]
Evaluation and mitigation of the limitations of large language models in clinical decision-making.

Nat Med. 2024-9

[7]
Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.

J Med Internet Res. 2024-6-14

[8]
Use of artificial intelligence chatbots in clinical management of immune-related adverse events.

J Immunother Cancer. 2024-5-30

[9]
Use of a Large Language Model to Assess Clinical Acuity of Adults in the Emergency Department.

JAMA Netw Open. 2024-5-1

[10]
Evaluating large language models as agents in the clinic.

NPJ Digit Med. 2024-4-3

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索