前沿大语言模型的分诊与诊断准确性：与医生表现的最新比较 - Suppr | 超能文献

文献检索
文档翻译
深度研究
学术资讯

Zotero 插件

邀请有礼
套餐&价格
历史记录

应用&插件

Zotero 插件浏览器插件 Mac 客户端 Windows 客户端微信小程序

定价

高级版会员购买积分包购买API积分包

服务

文献检索文档翻译深度研究 API 文档 MCP 服务

关于我们

关于 Suppr 公司介绍联系我们用户协议隐私条款

关注我们

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

The Triage and Diagnostic Accuracy of Frontier Large Language Models: Updated Comparison to Physician Performance.

作者信息

Sorich Michael Joseph, Mangoni Arduino Aleksander, Bacchi Stephen, Menz Bradley Douglas, Hopkins Ashley Mark

机构信息

College of Medicine and Public Health, Flinders University, Adelaide, Australia.

Department of Clinical Pharmacology, Southern Adelaide Local Health Network, Adelaide, Australia.

出版信息

J Med Internet Res. 2024 Dec 6;26:e67409. doi: 10.2196/67409.

DOI:10.2196/67409

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11662182/

Abstract

摘要

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aff4/11662182/5980f613f4c3/jmir_v26i1e67409_fig1.jpg

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aff4/11662182/5980f613f4c3/jmir_v26i1e67409_fig1.jpg

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aff4/11662182/5980f613f4c3/jmir_v26i1e67409_fig1.jpg

相似文献

1

The Triage and Diagnostic Accuracy of Frontier Large Language Models: Updated Comparison to Physician Performance.前沿大语言模型的分诊与诊断准确性：与医生表现的最新比较

J Med Internet Res. 2024 Dec 6;26:e67409. doi: 10.2196/67409.

2

Evaluating LLM-based generative AI tools in emergency triage: A comparative study of ChatGPT Plus, Copilot Pro, and triage nurses.评估基于大语言模型的生成式人工智能工具在急诊分诊中的应用：ChatGPT Plus、Copilot Pro与分诊护士的对比研究

Am J Emerg Med. 2025 Mar;89:174-181. doi: 10.1016/j.ajem.2024.12.024. Epub 2024 Dec 19.

3

Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断：对流行的大型语言模型的定性研究。

JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.

4

Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.分诊表现比较：大型语言模型、ChatGPT 和未经训练的急诊医生：一项对比研究。

J Med Internet Res. 2024 Jun 14;26:e53297. doi: 10.2196/53297.

5

A Comparative Analysis of the Performance of Large Language Models and Human Respondents in Dermatology.大语言模型与人类受试者在皮肤病学方面表现的比较分析

Indian Dermatol Online J. 2025 Feb 27;16(2):241-247. doi: 10.4103/idoj.idoj_221_24. eCollection 2025 Mar-Apr.

6

Accuracy of a Commercial Large Language Model (ChatGPT) to Perform Disaster Triage of Simulated Patients Using the Simple Triage and Rapid Treatment (START) Protocol: Gage Repeatability and Reproducibility Study.商用大型语言模型（ChatGPT）运用简单分诊与快速治疗（START）协议对模拟患者进行灾难分诊的准确性：再现性和可重复性研究。

J Med Internet Res. 2024 Sep 30;26:e55648. doi: 10.2196/55648.

7

Comparing Diagnostic Accuracy of Clinical Professionals and Large Language Models: Systematic Review and Meta-Analysis.比较临床专业人员和大语言模型的诊断准确性：系统评价与荟萃分析

JMIR Med Inform. 2025 Apr 25;13:e64963. doi: 10.2196/64963.

8

Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.大型语言模型在新冠肺炎对妊娠影响方面的熟练度、清晰度和客观性与专家知识对比：横断面试点研究

JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.

9

Examining the Role of Large Language Models in Orthopedics: Systematic Review.检查大型语言模型在骨科中的作用：系统评价。

J Med Internet Res. 2024 Nov 15;26:e59607. doi: 10.2196/59607.

10

Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.评估生成式 AI 大语言模型 ChatGPT、Google Bard 和 Microsoft Bing Chat 在支持循证牙科方面的性能：比较混合方法研究。

J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.

本文引用的文献

1

The diagnostic and triage accuracy of the GPT-3 artificial intelligence model: an observational study.GPT-3 人工智能模型的诊断和分诊准确性：一项观察性研究。

Lancet Digit Health. 2024 Aug;6(8):e555-e561. doi: 10.1016/S2589-7500(24)00097-9.

2

Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.分诊表现比较：大型语言模型、ChatGPT 和未经训练的急诊医生：一项对比研究。

J Med Internet Res. 2024 Jun 14;26:e53297. doi: 10.2196/53297.

3

Use of a Large Language Model to Assess Clinical Acuity of Adults in the Emergency Department.

使用大型语言模型评估急诊科成人的临床敏锐度。

JAMA Netw Open. 2024 May 1;7(5):e248895. doi: 10.1001/jamanetworkopen.2024.8895.

4

Quality and safety of artificial intelligence generated health information.人工智能生成的健康信息的质量与安全性。

BMJ. 2024 Mar 20;384:q596. doi: 10.1136/bmj.q596.