Suppr超能文献

生成式人工智能与临床医生:谁能更快、更准确地诊断多发性硬化症?

Generative artificial intelligence versus clinicians: Who diagnoses multiple sclerosis faster and with greater accuracy?

机构信息

The University of Texas Southwestern Medical Center, Department of Neurology, Neuroinnovation Program, Multiple Sclerosis & Neuroimmunology Imaging Program, Dallas, TX, USA; The University of Texas Southwestern Medical Center, Peter O'Donnell Jr. Brain Institute, Dallas, Texas, USA.

The University of Texas Southwestern Medical Center, School of Medicine, Dallas, Texas, USA.

出版信息

Mult Scler Relat Disord. 2024 Oct;90:105791. doi: 10.1016/j.msard.2024.105791. Epub 2024 Aug 6.

Abstract

BACKGROUND

Those receiving the diagnosis of multiple sclerosis (MS) over the next ten years will predominantly be part of Generation Z (Gen Z). Recent observations within our clinic suggest that younger people with MS utilize online generative artificial intelligence (AI) platforms for personalized medical advice prior to their first visit with a specialist in neuroimmunology. The use of such platforms is anticipated to increase given the technology driven nature, desire for instant communication, and cost-conscious nature of Gen Z. Our objective was to determine if ChatGPT (Generative Pre-trained Transformer) could diagnose MS in individuals earlier than their clinical timeline, and to assess if the accuracy differed based on age, sex, and race/ethnicity.

METHODS

People with MS between 18 and 59 years of age were studied. The clinical timeline for people diagnosed with MS was retrospectively identified and simulated using ChatGPT-3.5 (GPT-3.5). Chats were conducted using both actual and derivatives of their age, sex, and race/ethnicity to test diagnostic accuracy. A Kaplan-Meier survival curve was estimated for time to diagnosis, clustered by subject. The p-value testing for differences in time to diagnosis was accomplished using a general Wilcoxon test. Logistic regression (subject-specific intercept) was used to capture intra-subject correlation to test the accuracy prior to and after the inclusion of MRI data.

RESULTS

The study cohort included 100 unique people with MS. Of those, 50 were members of Gen Z (38 female; 22 White; mean age at first symptom was 20.6 years (y) (standard deviation (SD)=2.2y)), and 50 were non-Gen Z (34 female; 27 White; mean age at first symptom was 37.0y (SD=10.4y)). In addition, a total of 529 people that represented digital simulations of the original cohort of 100 people (333 female; 166 White; 136 Black/African American; 107 Asian; 120 Hispanic, mean age at first symptom was 31.6y (SD=12.4y)) were generated allowing for 629 scripted conversations to be analyzed. The estimated median time to diagnosis in clinic was significantly longer at 0.35y (95% CI=[0.28, 0.48]) versus that by ChatGPT at 0.08y (95% CI=[0.04, 0.24]) (p<0.0001). There was no difference in the diagnostic accuracy between ages and by race/ethnicity prior to the inclusion of MRI data. However, prior to including the MRI data, males had a 47% less likely chance of a correct diagnosis relative to females (p=0.05). Post-MRI data inclusion within GPT-3.5, the odds of an accurate diagnosis was 4.0-fold greater for Gen Z participants, relative to non-Gen Z participants (p=0.01) with the diagnostic accuracy being 68% less in males relative to females (p=0.009), and 75% less for White subjects, relative to non-White subjects (p=0.0004).

CONCLUSION

Although generative AI platforms enable rapid information access and are not principally designed for use in healthcare, an increase in use by Gen Z is anticipated. However, the obtained responses may not be generalizable to all users and bias may exist in select groups.

摘要

背景

在未来十年内,被诊断为多发性硬化症 (MS) 的人主要将属于 Z 世代 (Gen Z)。我们诊所最近的观察结果表明,年轻的 MS 患者在首次就诊于神经免疫专家之前,会利用在线生成式人工智能 (AI) 平台寻求个性化的医疗建议。考虑到 Gen Z 的技术驱动性质、对即时沟通的渴望和成本意识,预计此类平台的使用会增加。我们的目的是确定 ChatGPT(生成式预训练转换器)是否可以比临床时间表更早地诊断 MS,并评估准确性是否因年龄、性别和种族/民族而有所不同。

方法

研究了年龄在 18 至 59 岁之间的 MS 患者。使用 ChatGPT-3.5 (GPT-3.5) 回顾性确定和模拟 MS 患者的临床时间表。使用实际年龄和年龄的衍生数据进行聊天,以测试诊断准确性。使用 Kaplan-Meier 生存曲线估计诊断时间,按个体进行聚类。使用一般 Wilcoxon 检验测试诊断时间差异的 p 值。使用逻辑回归(个体特定截距)捕获个体内相关性,以在纳入 MRI 数据之前和之后测试准确性。

结果

研究队列包括 100 名独特的 MS 患者。其中,50 名是 Z 世代成员(38 名女性;22 名白人;首发症状平均年龄为 20.6 岁(标准差 (SD)=2.2 岁)),50 名非 Z 世代成员(34 名女性;27 名白人;首发症状平均年龄为 37.0 岁 (SD=10.4 岁))。此外,总共生成了 529 名代表原始 100 人队列的数字模拟人(333 名女性;166 名白人;136 名黑人和非裔美国人;107 名亚洲人;120 名西班牙裔人;首发症状平均年龄为 31.6 岁 (SD=12.4 岁)),允许分析 629 个脚本对话。通过 ChatGPT 估计的诊所中位诊断时间明显更长,为 0.08 年(95%CI=[0.04, 0.24]),而临床就诊的中位诊断时间为 0.35 年(95%CI=[0.28, 0.48])(p<0.0001)。在纳入 MRI 数据之前,年龄和种族/民族之间的诊断准确性没有差异。然而,在纳入 MRI 数据之前,男性正确诊断的可能性比女性低 47%(p=0.05)。在 GPT-3.5 中纳入 MRI 数据后,Z 世代参与者的准确诊断几率是非 Z 世代参与者的 4.0 倍(p=0.01),男性的准确诊断几率比女性低 68%(p=0.009),白人比非白人低 75%(p=0.0004)。

结论

尽管生成式 AI 平台能够快速获取信息,并且并非主要设计用于医疗保健,但预计 Z 世代的使用会增加。然而,获得的回复可能不适用于所有用户,并且在特定群体中可能存在偏差。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验