Xue Jiezheng, Wang Zhouqian, Chen Nuo, Wu Yue, Shen Zhaomeng, Shao Yi, Zhou Heding, Li Zhongwen
Ningbo Key Laboratory of Medical Research on Blinding Eye Diseases, Ningbo Eye Institute, Ningbo Eye Hospital, Wenzhou Medical University, Ningbo, China.
Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Disease, Shanghai, China.
Front Cell Dev Biol. 2025 Mar 27;13:1564054. doi: 10.3389/fcell.2025.1564054. eCollection 2025.
This study aimed to evaluate the potential of ChatGPT in diagnosing ocular trauma cases in emergency settings and determining the necessity for surgical intervention.
This retrospective observational study analyzed 52 ocular trauma cases from Ningbo Eye Hospital. Each case was input into GPT-3.5 turbo and GPT-4.0 turbo in Chinese and English. Ocular surface photographs were independently incorporated into the input to assess ChatGPT's multimodal performance. Six senior ophthalmologists evaluated the image descriptions generated by GPT-4.0 turbo.
With text-only input, the diagnostic accuracy rate was 80.77%-88.46% with GPT-3.5 turbo and 94.23%-98.08% with GPT-4.0 turbo. After replacing examination information with photography, GPT-4.0 turbo's diagnostic accuracy rate decreased to 63.46%. In the image understanding evaluation, the mean completeness scores attained 3.59 ± 0.94 to 3.69 ± 0.90. The mean correctness scores attained 3.21 ± 1.04 to 3.38 ± 1.00.
This study demonstrates ChatGPT has the potential to help emergency physicians assess and triage ocular trauma patients properly and timely. However, its ability in clinical image understanding needs to be further improved.
本研究旨在评估ChatGPT在急诊环境中诊断眼外伤病例以及确定手术干预必要性方面的潜力。
这项回顾性观察性研究分析了来自宁波眼科医院的52例眼外伤病例。每个病例分别以中文和英文输入GPT-3.5 turbo和GPT-4.0 turbo。将眼表照片独立纳入输入内容以评估ChatGPT的多模态性能。六位资深眼科医生对GPT-4.0 turbo生成的图像描述进行了评估。
仅输入文本时,GPT-3.5 turbo的诊断准确率为80.77%-88.46%,GPT-4.0 turbo的诊断准确率为94.23%-98.08%。用照片取代检查信息后,GPT-4.0 turbo的诊断准确率降至63.46%。在图像理解评估中,平均完整性得分达到3.59±0.94至3.69±0.90。平均正确性得分达到3.21±1.04至3.38±1.00。
本研究表明ChatGPT有潜力帮助急诊医生正确、及时地评估和分诊眼外伤患者。然而,其在临床图像理解方面的能力有待进一步提高。