• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生成式人工智能模型在预测儿科急诊严重程度指数水平中的评估

Evaluation of Generative Artificial Intelligence Models in Predicting Pediatric Emergency Severity Index Levels.

作者信息

Ho Brandon, Lu Meng, Wang Xuan, Butler Russell, Park Joshua, Ren Dennis

机构信息

Department of Computer Science, Virginia Tech, Falls Church, VA.

University of California Davis School of Medicine, Sacramento, CA.

出版信息

Pediatr Emerg Care. 2025 Apr 1;41(4):251-255. doi: 10.1097/PEC.0000000000003315. Epub 2025 Jan 7.

DOI:10.1097/PEC.0000000000003315
PMID:39761573
Abstract

OBJECTIVE

Evaluate the accuracy and reliability of various generative artificial intelligence (AI) models (ChatGPT-3.5, ChatGPT-4.0, T5, Llama-2, Mistral-Large, and Claude-3 Opus) in predicting Emergency Severity Index (ESI) levels for pediatric emergency department patients and assess the impact of medically oriented fine-tuning.

METHODS

Seventy pediatric clinical vignettes from the ESI Handbook version 4 were used as the gold standard. Each AI model predicted the ESI level for each vignette. Performance metrics, including sensitivity, specificity, and F1 score, were calculated. Reliability was assessed by repeating the tests and measuring the interrater reliability using Fleiss kappa. Paired t tests were used to compare the models before and after fine-tuning.

RESULTS

Claude-3 Opus achieved the highest performance amongst the untrained models with a sensitivity of 80.6% (95% confidence interval [CI]: 63.6-90.7), specificity of 91.3% (95% CI: 83.8-99), and an F1 score of 73.9% (95% CI: 58.9-90.7). After fine-tuning, the GPT-4.0 model showed statistically significant improvement with a sensitivity of 77.1% (95% CI: 60.1-86.5), specificity of 92.5% (95% CI: 89.5-97.4), and an F1 score of 74.6% (95% CI: 63.9-83.8, P  < 0.04). Reliability analysis revealed high agreement for Claude-3 Opus (Fleiss κ: 0.85), followed by Mistral-Large (Fleiss κ: 0.79) and trained GPT-4.0 (Fleiss κ: 0.67). Training improved the reliability of GPT models ( P  < 0.001).

CONCLUSIONS

Generative AI models demonstrate promising accuracy in predicting pediatric ESI levels, with fine-tuning significantly enhancing their performance and reliability. These findings suggest that AI could serve as a valuable tool in pediatric triage.

摘要

目的

评估各种生成式人工智能(AI)模型(ChatGPT - 3.5、ChatGPT - 4.0、T5、Llama - 2、Mistral - Large和Claude - 3 Opus)预测儿科急诊科患者急诊严重程度指数(ESI)水平的准确性和可靠性,并评估医学导向微调的影响。

方法

将来自《ESI手册》第4版的70个儿科临床病例 vignettes 用作金标准。每个AI模型预测每个病例 vignette 的ESI水平。计算包括敏感性、特异性和F1分数在内的性能指标。通过重复测试并使用Fleiss kappa测量评分者间信度来评估可靠性。使用配对t检验比较微调前后的模型。

结果

在未经训练的模型中,Claude - 3 Opus表现最佳,敏感性为80.6%(95%置信区间[CI]:63.6 - 90.7),特异性为91.3%(95% CI:83.8 - 99),F1分数为73.9%(95% CI:58.9 - 90.7)。微调后,GPT - 4.0模型显示出统计学上的显著改善,敏感性为77.1%(95% CI:60.1 - 86.5),特异性为92.5%(95% CI:89.5 - 97.4),F1分数为74.6%(95% CI:63.9 - 83.8,P  < 0.04)。可靠性分析显示Claude - 3 Opus的一致性较高(Fleiss κ:0.85),其次是Mistral - Large(Fleiss κ:0.79)和经过训练的GPT - 4.0(Fleiss κ:0.67)。训练提高了GPT模型的可靠性(P  < 0.001)。

结论

生成式AI模型在预测儿科ESI水平方面显示出有前景的准确性,微调显著提高了它们的性能和可靠性。这些发现表明AI可作为儿科分诊中的一种有价值的工具。

相似文献

1
Evaluation of Generative Artificial Intelligence Models in Predicting Pediatric Emergency Severity Index Levels.生成式人工智能模型在预测儿科急诊严重程度指数水平中的评估
Pediatr Emerg Care. 2025 Apr 1;41(4):251-255. doi: 10.1097/PEC.0000000000003315. Epub 2025 Jan 7.
2
Evaluating LLM-based generative AI tools in emergency triage: A comparative study of ChatGPT Plus, Copilot Pro, and triage nurses.评估基于大语言模型的生成式人工智能工具在急诊分诊中的应用:ChatGPT Plus、Copilot Pro与分诊护士的对比研究
Am J Emerg Med. 2025 Mar;89:174-181. doi: 10.1016/j.ajem.2024.12.024. Epub 2024 Dec 19.
3
Chat-GPT in triage: Still far from surpassing human expertise - An observational study.分诊中的Chat-GPT:仍远未超越人类专业知识——一项观察性研究。
Am J Emerg Med. 2025 Jun;92:165-171. doi: 10.1016/j.ajem.2025.03.028. Epub 2025 Mar 18.
4
Comparative analysis of ChatGPT, Gemini and emergency medicine specialist in ESI triage assessment.ChatGPT、Gemini 与急诊专科医生在急诊病情严重程度分级评估中的比较分析。
Am J Emerg Med. 2024 Jul;81:146-150. doi: 10.1016/j.ajem.2024.05.001. Epub 2024 May 3.
5
Assessing the precision of artificial intelligence in ED triage decisions: Insights from a study with ChatGPT.评估人工智能在急诊分诊决策中的精准度:来自一项与 ChatGPT 合作研究的洞察。
Am J Emerg Med. 2024 Apr;78:170-175. doi: 10.1016/j.ajem.2024.01.037. Epub 2024 Jan 24.
6
Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.分诊表现比较:大型语言模型、ChatGPT 和未经训练的急诊医生:一项对比研究。
J Med Internet Res. 2024 Jun 14;26:e53297. doi: 10.2196/53297.
7
Claude 3 Opus and ChatGPT With GPT-4 in Dermoscopic Image Analysis for Melanoma Diagnosis: Comparative Performance Analysis.用于黑色素瘤诊断的皮肤镜图像分析中Claude 3 Opus和配备GPT-4的ChatGPT:比较性能分析
JMIR Med Inform. 2024 Aug 6;12:e59273. doi: 10.2196/59273.
8
Emergency department triaging using ChatGPT based on emergency severity index principles: a cross-sectional study.基于急诊严重指数原则的使用 ChatGPT 进行急诊科分诊:一项横断面研究。
Sci Rep. 2024 Sep 27;14(1):22106. doi: 10.1038/s41598-024-73229-7.
9
Large Language Models for Simplified Interventional Radiology Reports: A Comparative Analysis.用于简化介入放射学报告的大语言模型:一项比较分析
Acad Radiol. 2025 Feb;32(2):888-898. doi: 10.1016/j.acra.2024.09.041. Epub 2024 Sep 30.
10
Assessing the feasibility of ChatGPT-4o and Claude 3-Opus in thyroid nodule classification based on ultrasound images.评估ChatGPT-4o和Claude 3-Opus基于超声图像进行甲状腺结节分类的可行性。
Endocrine. 2025 Mar;87(3):1041-1049. doi: 10.1007/s12020-024-04066-x. Epub 2024 Oct 11.

引用本文的文献

1
Artificial Intelligence Outperforms Physicians in General Medical Knowledge, Except in the Paediatrics Domain: A Cross-Sectional Study.人工智能在一般医学知识方面表现优于医生,但在儿科领域除外:一项横断面研究。
Bioengineering (Basel). 2025 Jun 14;12(6):653. doi: 10.3390/bioengineering12060653.