评估多模态ChatGPT在眼外伤病例紧急决策中的应用。

Evaluating multimodal ChatGPT for emergency decision-making of ocular trauma cases.

作者信息

Xue Jiezheng, Wang Zhouqian, Chen Nuo, Wu Yue, Shen Zhaomeng, Shao Yi, Zhou Heding, Li Zhongwen

机构信息

Ningbo Key Laboratory of Medical Research on Blinding Eye Diseases, Ningbo Eye Institute, Ningbo Eye Hospital, Wenzhou Medical University, Ningbo, China.

Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Disease, Shanghai, China.

出版信息

Front Cell Dev Biol. 2025 Mar 27;13:1564054. doi: 10.3389/fcell.2025.1564054. eCollection 2025.

DOI:10.3389/fcell.2025.1564054

PMID:40213397

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11983629/

Abstract

PURPOSE

This study aimed to evaluate the potential of ChatGPT in diagnosing ocular trauma cases in emergency settings and determining the necessity for surgical intervention.

METHODS

This retrospective observational study analyzed 52 ocular trauma cases from Ningbo Eye Hospital. Each case was input into GPT-3.5 turbo and GPT-4.0 turbo in Chinese and English. Ocular surface photographs were independently incorporated into the input to assess ChatGPT's multimodal performance. Six senior ophthalmologists evaluated the image descriptions generated by GPT-4.0 turbo.

RESULTS

With text-only input, the diagnostic accuracy rate was 80.77%-88.46% with GPT-3.5 turbo and 94.23%-98.08% with GPT-4.0 turbo. After replacing examination information with photography, GPT-4.0 turbo's diagnostic accuracy rate decreased to 63.46%. In the image understanding evaluation, the mean completeness scores attained 3.59 ± 0.94 to 3.69 ± 0.90. The mean correctness scores attained 3.21 ± 1.04 to 3.38 ± 1.00.

CONCLUSION

This study demonstrates ChatGPT has the potential to help emergency physicians assess and triage ocular trauma patients properly and timely. However, its ability in clinical image understanding needs to be further improved.

摘要

目的

本研究旨在评估ChatGPT在急诊环境中诊断眼外伤病例以及确定手术干预必要性方面的潜力。

方法

这项回顾性观察性研究分析了来自宁波眼科医院的52例眼外伤病例。每个病例分别以中文和英文输入GPT-3.5 turbo和GPT-4.0 turbo。将眼表照片独立纳入输入内容以评估ChatGPT的多模态性能。六位资深眼科医生对GPT-4.0 turbo生成的图像描述进行了评估。

结果

仅输入文本时，GPT-3.5 turbo的诊断准确率为80.77%-88.46%，GPT-4.0 turbo的诊断准确率为94.23%-98.08%。用照片取代检查信息后，GPT-4.0 turbo的诊断准确率降至63.46%。在图像理解评估中，平均完整性得分达到3.59±0.94至3.69±0.90。平均正确性得分达到3.21±1.04至3.38±1.00。