Suppr超能文献

大语言模型对管理推理的影响:一项随机对照试验。

Large Language Model Influence on Management Reasoning: A Randomized Controlled Trial.

作者信息

Goh Ethan, Gallo Robert, Strong Eric, Weng Yingjie, Kerman Hannah, Freed Jason, Cool Joséphine A, Kanjee Zahir, Lane Kathleen P, Parsons Andrew S, Ahuja Neera, Horvitz Eric, Yang Daniel, Milstein Arnold, Olson Andrew P J, Hom Jason, Chen Jonathan H, Rodman Adam

机构信息

Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA.

Stanford Clinical Excellence Research Center, Stanford University, Stanford, CA.

出版信息

medRxiv. 2024 Aug 7:2024.08.05.24311485. doi: 10.1101/2024.08.05.24311485.

Abstract

IMPORTANCE

Large language model (LLM) artificial intelligence (AI) systems have shown promise in diagnostic reasoning, but their utility in management reasoning with no clear right answers is unknown.

OBJECTIVE

To determine whether LLM assistance improves physician performance on open-ended management reasoning tasks compared to conventional resources.

DESIGN

Prospective, randomized controlled trial conducted from 30 November 2023 to 21 April 2024.

SETTING

Multi-institutional study from Stanford University, Beth Israel Deaconess Medical Center, and the University of Virginia involving physicians from across the United States.

PARTICIPANTS

92 practicing attending physicians and residents with training in internal medicine, family medicine, or emergency medicine.

INTERVENTION

Five expert-developed clinical case vignettes were presented with multiple open-ended management questions and scoring rubrics created through a Delphi process. Physicians were randomized to use either GPT-4 via ChatGPT Plus in addition to conventional resources (e.g., UpToDate, Google), or conventional resources alone.

MAIN OUTCOMES AND MEASURES

The primary outcome was difference in total score between groups on expert-developed scoring rubrics. Secondary outcomes included domain-specific scores and time spent per case.

RESULTS

Physicians using the LLM scored higher compared to those using conventional resources (mean difference 6.5 %, 95% CI 2.7-10.2, p<0.001). Significant improvements were seen in management decisions (6.1%, 95% CI 2.5-9.7, p=0.001), diagnostic decisions (12.1%, 95% CI 3.1-21.0, p=0.009), and case-specific (6.2%, 95% CI 2.4-9.9, p=0.002) domains. GPT-4 users spent more time per case (mean difference 119.3 seconds, 95% CI 17.4-221.2, p=0.02). There was no significant difference between GPT-4-augmented physicians and GPT-4 alone (-0.9%, 95% CI -9.0 to 7.2, p=0.8).

CONCLUSIONS AND RELEVANCE

LLM assistance improved physician management reasoning compared to conventional resources, with particular gains in contextual and patient-specific decision-making. These findings indicate that LLMs can augment management decision-making in complex cases.

TRIAL REGISTRATION

ClinicalTrials.gov Identifier: NCT06208423; https://classic.clinicaltrials.gov/ct2/show/NCT06208423.

摘要

重要性

大语言模型(LLM)人工智能(AI)系统在诊断推理方面已显示出前景,但其在无明确正确答案的管理推理中的效用尚不清楚。

目的

确定与传统资源相比,LLM辅助是否能提高医生在开放式管理推理任务中的表现。

设计

2023年11月30日至2024年4月21日进行的前瞻性随机对照试验。

设置

来自斯坦福大学、贝斯以色列女执事医疗中心和弗吉尼亚大学的多机构研究,涉及美国各地的医生。

参与者

92名在内科、家庭医学或急诊医学方面接受过培训的执业主治医师和住院医师。

干预措施

呈现五个由专家开发的临床病例 vignettes,附带多个开放式管理问题以及通过德尔菲法创建的评分标准。医生被随机分配,一组除使用传统资源(如UpToDate、谷歌)外,还通过ChatGPT Plus使用GPT-4,另一组仅使用传统资源。

主要结局和衡量指标

主要结局是两组在专家开发的评分标准上总分的差异。次要结局包括特定领域得分和每个病例花费的时间。

结果

与使用传统资源的医生相比,使用LLM的医生得分更高(平均差异6.5%,95%置信区间2.7 - 10.2,p<0.001)。在管理决策(6.1%,95%置信区间2.5 - 9.7,p = 0.001)、诊断决策(12.1%,95%置信区间3.1 - 21.0,p = 0.009)和特定病例领域(6.2%,95%置信区间2.4 - 9.9,p = 0.002)方面有显著改善。使用GPT-4的用户每个病例花费的时间更多(平均差异119.3秒,95%置信区间17.4 - 221.2,p = 0.02)。使用GPT-4增强的医生与仅使用GPT-4的医生之间没有显著差异(-0.9%,95%置信区间 - 9.0至7.2,p = 0.8)。

结论及相关性

与传统资源相比,LLM辅助改善了医生的管理推理,在情境和特定患者决策方面有特别的提升。这些发现表明LLMs可以增强复杂病例中的管理决策。

试验注册

ClinicalTrials.gov标识符:NCT06208423;https://classic.clinicaltrials.gov/ct2/show/NCT06208423

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f32a/11326321/e5b64ea7bbfa/nihpp-2024.08.05.24311485v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验