Suppr超能文献

对ChatGPT在急性中风准确诊断中的潜力进行回顾性评估。

A retrospective evaluation of the potential of ChatGPT in the accurate diagnosis of acute stroke.

作者信息

Kuzan Beyza Nur, Meşe İsmail, Yaşar Servan, Kuzan Taha Yusuf

机构信息

Kartal Dr. Lütfi Kırdar City Hospital, Clinic of Radiology, İstanbul, Türkiye.

Üsküdar State Hospital, Clinic of Radiology, İstanbul, Türkiye.

出版信息

Diagn Interv Radiol. 2025 Apr 28;31(3):187-195. doi: 10.4274/dir.2024.242892. Epub 2024 Sep 2.

Abstract

PURPOSE

Stroke is a neurological emergency requiring rapid, accurate diagnosis to prevent severe consequences. Early diagnosis is crucial for reducing morbidity and mortality. Artificial intelligence (AI) diagnosis support tools, such as Chat Generative Pre-trained Transformer (ChatGPT), offer rapid diagnostic advantages. This study assesses ChatGPT's accuracy in interpreting diffusion-weighted imaging (DWI) for acute stroke diagnosis.

METHODS

A retrospective analysis was conducted to identify the presence of stroke using DWI and apparent diffusion coefficient (ADC) map images. Patients aged >18 years who exhibited diffusion restriction and had a clinically explainable condition were included in the study. Patients with artifacts that affected image homogeneity, accuracy, and clarity, as well as those who had undergone previous surgery or had a history of stroke, were excluded from the study. ChatGPT was asked four consecutive questions regarding the identification of the magnetic resonance imaging (MRI) sequence, the demonstration of diffusion restriction on the ADC map after sequence recognition, and the identification of hemispheres and specific lobes. Each question was repeated 10 times to ensure consistency. Senior radiologists subsequently verified the accuracy of ChatGPT's responses, classifying them as either correct or incorrect. We assumed a response to be incorrect if it was partially correct or suggested multiple answers. These responses were systematically recorded. We also recorded non-responses from ChatGPT-4V when it failed to provide an answer to a query. We assessed ChatGPT-4V's performance by calculating the number and percentage of correct responses, incorrect responses, and non-responses across all images and questions, a metric known as "accuracy." ChatGPT-4V was considered successful if it answered ≥80% of the examples correctly.

RESULTS

A total of 530 diffusion MRI, of which 266 were stroke images and 264 were normal, were evaluated in the study. For the initial query identifying MRI sequence type, ChatGPT-4V's accuracy was 88.3% for stroke and 90.1% for normal images. For detecting diffusion restriction, ChatGPT-4V had an accuracy of 79.5% for stroke images, with a 15% false positive rate for normal images. Regarding identifying the brain or cerebellar hemisphere involved, ChatGPT-4V correctly identified the hemisphere in 26.2% of stroke images. For identifying the specific brain lobe or cerebellar area affected, ChatGPT-4V had a 20.4% accuracy for stroke images. The diagnostic sensitivity of ChatGPT-4V in acute stroke was found to be 79.57%, with a specificity of 84.87%, a positive predictive value of 83.86%, a negative predictive value of 80.80%, and a diagnostic odds ratio of 21.86.

CONCLUSION

Despite limitations, ChatGPT shows potential as a supportive tool for healthcare professionals in interpreting diffusion examinations in stroke cases, where timely diagnosis is critical.

CLINICAL SIGNIFICANCE

ChatGPT can play an important role in various aspects of stroke cases, such as risk assessment, early diagnosis, and treatment planning.

摘要

目的

中风是一种神经急症,需要快速、准确的诊断以防止严重后果。早期诊断对于降低发病率和死亡率至关重要。人工智能(AI)诊断支持工具,如聊天生成预训练变换器(ChatGPT),具有快速诊断的优势。本研究评估ChatGPT在解释用于急性中风诊断的扩散加权成像(DWI)方面的准确性。

方法

进行回顾性分析,使用DWI和表观扩散系数(ADC)图图像来确定中风的存在。年龄大于18岁、表现出扩散受限且有临床可解释情况的患者纳入研究。排除影响图像均匀性、准确性和清晰度的伪影患者,以及曾接受过手术或有中风病史的患者。向ChatGPT连续提出四个关于磁共振成像(MRI)序列识别、序列识别后ADC图上扩散受限的显示以及半球和特定脑叶识别的问题。每个问题重复10次以确保一致性。随后由资深放射科医生验证ChatGPT回答的准确性,将其分类为正确或错误。如果回答部分正确或提出多个答案,我们将其视为错误回答。这些回答被系统记录。当ChatGPT - 4V未能回答查询时,我们也记录无回答情况。我们通过计算所有图像和问题中正确回答、错误回答和无回答的数量及百分比来评估ChatGPT - 4V的性能,这一指标称为“准确性”。如果ChatGPT - 4V正确回答≥80%的示例,则认为其成功。

结果

本研究共评估了530幅扩散MRI图像,其中266幅为中风图像,264幅为正常图像。对于最初识别MRI序列类型的查询,ChatGPT - 4V对中风图像的准确性为88.3%,对正常图像的准确性为90.1%。对于检测扩散受限,ChatGPT - 4V对中风图像的准确性为79.5%,正常图像的假阳性率为15%。关于识别受累的脑叶或小脑半球,ChatGPT - 4V在26.2%的中风图像中正确识别了半球。对于识别受影响的特定脑叶或小脑区域,ChatGPT - 4V对中风图像的准确性为20.4%。发现ChatGPT - 4V在急性中风中的诊断敏感性为79.57%,特异性为84.87%,阳性预测值为83.86%,阴性预测值为80.80%,诊断优势比为21.86。

结论

尽管存在局限性,但在中风病例中,及时诊断至关重要,ChatGPT作为医疗保健专业人员解释扩散检查的支持工具具有潜力。

临床意义

ChatGPT在中风病例的各个方面,如风险评估、早期诊断和治疗规划中都可以发挥重要作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d323/12057523/9b424a42a2e0/DiagnIntervRadiol-31-3-187-figure-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验