Department of Neurology, Stanford University School of Medicine, Palo Alto, CA, United States of America.
Department of Neurology, University of Texas at Austin, Austin, TX, United States of America.
J Neurol Sci. 2023 Oct 15;453:120804. doi: 10.1016/j.jns.2023.120804. Epub 2023 Sep 15.
This is an observational study of the performance of an artificial intelligence-powered chatbot tasked with solving unknown neurologic case vignettes. The primary objective of the study is to assess the current capabilities of widely-accessible artificial intelligence within the field of clinical neurology in order to determine how this technology can be deployed in clinical practice, and what insights can be learned from its performance and translated to clinical education.
This observational study tested the accuracy of GPT-4, an artificial intelligence-powered chatbot, at appropriately localizing and generating a differential diagnosis for a series of 29 clinical case vignettes. The cases were from previously published educational material prepared for learners. No cases required more than text input, a current limitation of GPT-4. The primary outcome measures were ranked accuracy of localization and differential diagnosis based on clinical history and exam alone and after ancillary clinical data was provided. Secondary outcome measures included a comparison of accuracy by case difficulty.
GPT-4 identified the correct localization less than 50% of the time and performed worse when provided ancillary testing. GPT-4 was more accurate with localization and diagnosis of easier versus harder cases. Diagnostic accuracy was independent of its ability to localize the lesion.
GPT-4 did not perform as well on neurology clinical vignettes as compared to reported accuracy when provided other medical clinical vignettes. Incorporation of an AI chatbot into the practice of clinical neurology will require neurology-focused teaching.
这是一项观察性研究,旨在评估人工智能驱动的聊天机器人在解决未知神经科病例小费时的表现。该研究的主要目的是评估广泛可及的人工智能在临床神经学领域的当前能力,以确定如何在临床实践中部署这项技术,以及可以从其性能中获得哪些见解并转化为临床教育。
这项观察性研究测试了人工智能聊天机器人 GPT-4 对一系列 29 个临床病例小费时进行适当定位和生成鉴别诊断的准确性。这些病例来自为学习者准备的先前发表的教育材料。GPT-4 目前仅能接受文本输入,因此没有案例需要更多输入。主要结局指标是根据临床病史和检查进行定位和鉴别诊断的准确性排名,以及在提供辅助临床数据后的准确性排名。次要结局指标包括按病例难度比较准确性。
GPT-4 确定正确定位的准确率不到 50%,在提供辅助测试时表现更差。GPT-4 对较简单病例的定位和诊断更准确。诊断准确性与其定位病变的能力无关。
与提供其他医学临床病例时的报告准确性相比,GPT-4 在神经科病例小费时的表现并不理想。将人工智能聊天机器人纳入临床神经科实践将需要针对神经科的教学。