Suppr超能文献

大语言模型ChatGPT-4omni能否预测成人癫痫持续状态患者的预后?

Can the large language model ChatGPT-4omni predict outcomes in adult patients with status epilepticus?

作者信息

Amacher Simon A, Baumann Sira M, Berger Sebastian, Arpagaus Armon, Egli Simon B, Grzonka Pascale, Kliem Paulina S C, Hunziker Sabina, Fisch Urs, Gebhard Caroline E, Sutter Raoul

机构信息

Clinic for Intensive Care Medicine, Department of Acute Care, University Hospital Basel, Basel, Switzerland.

Department of Anesthesiology and Intensive Care Medicine, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Germany.

出版信息

Epilepsia. 2025 Mar;66(3):674-685. doi: 10.1111/epi.18215. Epub 2024 Dec 26.

Abstract

OBJECTIVE

Large language models (LLMs) have recently gained attention for clinical decision-making and diagnosis. This study evaluates the performance of the recently updated LLM Chat Generative Pre-Trained Transformer-4omni (ChatGPT-4o) in predicting clinical outcomes in patients with status epilepticus and compares its prognostic performance to the Status Epilepticus Severity Score (STESS).

METHODS

This retrospective single-center cohort study was performed at the University Hospital Basel (tertiary academic medical center) from January 2005 to December 2022. It included consecutive adult patients (≥18 years of age) with a diagnosis of status epilepticus. The primary outcome was survival at hospital discharge, and the secondary outcome was return to premorbid neurological function at hospital discharge. The performance characteristics of ChatGPT4-o (sensitivity, specificity, Youden Index) were evaluated and compared to those of the STESS.

RESULTS

Of 760 patients, 689 patients (90.7%) survived to discharge, and 317 survivors (41.7%) regained their premorbid neurological function at discharge. ChatGPT-4o predicted survival in 567 of 760 patients (74.6%), of which 45 died. ChatGPT-4o predicted death in 193 of 760 patients (25.4%), of which 167 survived, resulting in a sensitivity of 75.8% and a specificity of 36.6% (Youden Index 0.12, 95% confidence interval [CI] 0-.28) for predicting survival. ChatGPT-4o predicted return to premorbid neurologic function in 249 of 760 patients (32.8%), of which 112 did not return to their premorbid neurological function. ChatGPT-4o predicted no return to premorbid function in 511 of 760 patients (67.2%), of which 180 returned to their premorbid function, resulting in a sensitivity of 43.2% and a specificity of 74.7% (Youden Index .12, 95% CI .08-.28) for predicting return to premorbid neurological function. There was no difference in the prognostic performance of ChatGPT-4o and the STESS. A second round of prompting did not increase the predictive performance of ChatGPT-4o.

SIGNIFICANCE

ChatGPT-4o unreliably predicts outcomes in patients with status epilepticus. Clinicians should refrain from using ChatGPT-4o for prognostication in these patients.

摘要

目的

大语言模型(LLMs)最近在临床决策和诊断方面受到关注。本研究评估了最近更新的大语言模型Chat生成式预训练变换器4全知版(ChatGPT-4o)在预测癫痫持续状态患者临床结局方面的性能,并将其预后性能与癫痫持续状态严重程度评分(STESS)进行比较。

方法

本回顾性单中心队列研究于2005年1月至2022年12月在巴塞尔大学医院(三级学术医疗中心)进行。纳入连续的成年癫痫持续状态患者(≥18岁)。主要结局是出院时存活,次要结局是出院时恢复到病前神经功能。评估了ChatGPT4-o的性能特征(敏感性、特异性、约登指数),并与STESS的性能特征进行比较。

结果

760例患者中,689例(90.7%)存活至出院,317例幸存者(41.7%)出院时恢复到病前神经功能。ChatGPT-4o在760例患者中的567例(74.6%)中预测存活,其中45例死亡。ChatGPT-4o在760例患者中的193例(25.4%)中预测死亡,其中167例存活,预测存活的敏感性为75.8%,特异性为36.6%(约登指数0.12,95%置信区间[CI]0-.28)。ChatGPT-4o在760例患者中的249例(32.8%)中预测恢复到病前神经功能,其中112例未恢复到病前神经功能。ChatGPT-4o在760例患者中的511例(67.2%)中预测未恢复到病前功能,其中180例恢复到病前功能,预测恢复到病前神经功能的敏感性为43.2%,特异性为74.7%(约登指数.12,95%CI.08-.28)。ChatGPT-4o和STESS的预后性能没有差异。第二轮提示并未提高ChatGPT-4o的预测性能。

意义

ChatGPT-4o在预测癫痫持续状态患者结局方面不可靠。临床医生不应在这些患者中使用ChatGPT-4o进行预后评估。

相似文献

本文引用的文献

7
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验