评估多模态ChatGPT-4在使用眼底图像检测青光眼方面的优势和局限性。

Evaluating the strengths and limitations of multimodal ChatGPT-4 in detecting glaucoma using fundus images.

作者信息

AlRyalat Saif Aldeen, Musleh Ayman Mohammed, Kahook Malik Y

机构信息

Department of Ophthalmology, The University of Jordan, Amman, Jordan.

Department of Ophthalmology, Houston Methodist Hospital, Houston, TX, United States.

出版信息

Front Ophthalmol (Lausanne). 2024 Jun 7;4:1387190. doi: 10.3389/fopht.2024.1387190. eCollection 2024.

DOI:10.3389/fopht.2024.1387190

PMID:38984105

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11182172/

Abstract

OVERVIEW

This study evaluates the diagnostic accuracy of a multimodal large language model (LLM), ChatGPT-4, in recognizing glaucoma using color fundus photographs (CFPs) with a benchmark dataset and without prior training or fine tuning.

METHODS

The publicly accessible Retinal Fundus Glaucoma Challenge "REFUGE" dataset was utilized for analyses. The input data consisted of the entire 400 image testing set. The task involved classifying fundus images into either 'Likely Glaucomatous' or 'Likely Non-Glaucomatous'. We constructed a confusion matrix to visualize the results of predictions from ChatGPT-4, focusing on accuracy of binary classifications (glaucoma vs non-glaucoma).

RESULTS

ChatGPT-4 demonstrated an accuracy of 90% with a 95% confidence interval (CI) of 87.06%-92.94%. The sensitivity was found to be 50% (95% CI: 34.51%-65.49%), while the specificity was 94.44% (95% CI: 92.08%-96.81%). The precision was recorded at 50% (95% CI: 34.51%-65.49%), and the F1 Score was 0.50.

CONCLUSION

ChatGPT-4 achieved relatively high diagnostic accuracy without prior fine tuning on CFPs. Considering the scarcity of data in specialized medical fields, including ophthalmology, the use of advanced AI techniques, such as LLMs, might require less data for training compared to other forms of AI with potential savings in time and financial resources. It may also pave the way for the development of innovative tools to support specialized medical care, particularly those dependent on multimodal data for diagnosis and follow-up, irrespective of resource constraints.

摘要

概述

本研究使用基准数据集，在未进行预先训练或微调的情况下，评估多模态大语言模型ChatGPT-4通过彩色眼底照片（CFP）识别青光眼的诊断准确性。

方法

利用公开可用的视网膜眼底青光眼挑战“REFUGE”数据集进行分析。输入数据包括完整的400张图像测试集。任务是将眼底图像分类为“可能患有青光眼”或“可能未患青光眼”。我们构建了一个混淆矩阵来可视化ChatGPT-4的预测结果，重点关注二元分类（青光眼与非青光眼）的准确性。

结果

ChatGPT-4的准确率为90%，95%置信区间（CI）为87.06%-92.94%。敏感性为50%（95%CI：34.51%-65.49%），而特异性为94.44%（95%CI：92.08%-96.81%）。精确率记录为50%（95%CI：34.51%-65.49%），F1分数为0.50。

结论

ChatGPT-4在未对CFP进行预先微调的情况下实现了相对较高的诊断准确性。考虑到包括眼科在内的专业医学领域数据稀缺，与其他形式的人工智能相比，使用先进的人工智能技术（如大语言模型）可能需要更少的数据进行训练，从而有可能节省时间和财政资源。这也可能为开发支持专业医疗护理的创新工具铺平道路，特别是那些依赖多模态数据进行诊断和随访的工具，而不受资源限制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad2/11182172/fbadf13edcd0/fopht-04-1387190-g001.jpg

相似文献

Evaluating the strengths and limitations of multimodal ChatGPT-4 in detecting glaucoma using fundus images.

Front Ophthalmol (Lausanne). 2024 Jun 7;4:1387190. doi: 10.3389/fopht.2024.1387190. eCollection 2024.

Claude 3 Opus and ChatGPT With GPT-4 in Dermoscopic Image Analysis for Melanoma Diagnosis: Comparative Performance Analysis.

JMIR Med Inform. 2024 Aug 6;12:e59273. doi: 10.2196/59273.

Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration.

JMIR Med Inform. 2024 Apr 9;12:e55627. doi: 10.2196/55627.

Diagnosing Glaucoma Based on the Ocular Hypertension Treatment Study Dataset Using Chat Generative Pre-Trained Transformer as a Large Language Model.

Ophthalmol Sci. 2024 Aug 22;5(1):100599. doi: 10.1016/j.xops.2024.100599. eCollection 2025 Jan-Feb.

Exploring the Potential of ChatGPT-4 in Predicting Refractive Surgery Categorizations: Comparative Study.

JMIR Form Res. 2023 Dec 28;7:e51798. doi: 10.2196/51798.

Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.

JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.

A Deep Learning-Based Algorithm Identifies Glaucomatous Discs Using Monoscopic Fundus Photographs.

Ophthalmol Glaucoma. 2018 Jul-Aug;1(1):15-22. doi: 10.1016/j.ogla.2018.04.002. Epub 2018 Jun 5.

Deep learning-based automated detection of glaucomatous optic neuropathy on color fundus photographs.

Graefes Arch Clin Exp Ophthalmol. 2020 Apr;258(4):851-867. doi: 10.1007/s00417-020-04609-8. Epub 2020 Jan 27.

A User-friendly Approach for the Diagnosis of Diabetic Retinopathy Using ChatGPT and Automated Machine Learning.

Ophthalmol Sci. 2024 Feb 21;4(4):100495. doi: 10.1016/j.xops.2024.100495. eCollection 2024 Jul-Aug.

Utilizing human intelligence in artificial intelligence for detecting glaucomatous fundus images using human-in-the-loop machine learning.

Indian J Ophthalmol. 2022 Apr;70(4):1131-1138. doi: 10.4103/ijo.IJO_2583_21.

引用本文的文献

Assessing the Diagnostic Capabilities of ChatGPT-4 Omni in Grading Diabetic Retinopathy Fundoscopy Using Color Fundus Photographs.

Clin Ophthalmol. 2025 Aug 31;19:3103-3112. doi: 10.2147/OPTH.S517238. eCollection 2025.

What will the future role for large language models (LLMs) be in managing patients with glaucoma?

Expert Rev Ophthalmol. 2025 Jun;20(3):123-126. doi: 10.1080/17469899.2025.2487532. Epub 2025 Apr 1.

Multiple large language models versus experienced physicians in diagnosing challenging cases with gastrointestinal symptoms.

NPJ Digit Med. 2025 Feb 5;8(1):85. doi: 10.1038/s41746-025-01486-5.

Glaucoma Detection and Feature Identification via GPT-4V Fundus Image Analysis.

Ophthalmol Sci. 2024 Nov 29;5(2):100667. doi: 10.1016/j.xops.2024.100667. eCollection 2025 Mar-Apr.

Application of large language models in disease diagnosis and treatment.

Chin Med J (Engl). 2025 Jan 20;138(2):130-142. doi: 10.1097/CM9.0000000000003456. Epub 2024 Dec 26.

本文引用的文献

To protect science, we must use LLMs as zero-shot translators.

Nat Hum Behav. 2023 Nov;7(11):1830-1832. doi: 10.1038/s41562-023-01744-0.

The Use of ChatGPT to Assist in Diagnosing Glaucoma Based on Clinical Case Reports.

Ophthalmol Ther. 2023 Dec;12(6):3121-3132. doi: 10.1007/s40123-023-00805-x. Epub 2023 Sep 14.

Use of GPT-4 to Analyze Medical Records of Patients With Extensive Investigations and Delayed Diagnosis.

JAMA Netw Open. 2023 Aug 1;6(8):e2325000. doi: 10.1001/jamanetworkopen.2023.25000.

Performance of Generative Large Language Models on Ophthalmology Board-Style Questions.

Am J Ophthalmol. 2023 Oct;254:141-149. doi: 10.1016/j.ajo.2023.05.024. Epub 2023 Jun 18.

Artificial Intelligence and Glaucoma: Going Back to Basics.

Clin Ophthalmol. 2023 May 31;17:1525-1530. doi: 10.2147/OPTH.S410905. eCollection 2023.

Translating radiology reports into plain language using ChatGPT and GPT-4 with prompt learning: results, limitations, and potential.

Vis Comput Ind Biomed Art. 2023 May 18;6(1):9. doi: 10.1186/s42492-023-00136-5.

Artificial Hallucinations in ChatGPT: Implications in Scientific Writing.

Cureus. 2023 Feb 19;15(2):e35179. doi: 10.7759/cureus.35179. eCollection 2023 Feb.

New meaning for NLP: the trials and tribulations of natural language processing with GPT-3 in ophthalmology.

Br J Ophthalmol. 2022 Jul;106(7):889-892. doi: 10.1136/bjophthalmol-2022-321141. Epub 2022 May 6.

Diagnostic Accuracy of Artificial Intelligence in Glaucoma Screening and Clinical Practice.

J Glaucoma. 2022 May 1;31(5):285-299. doi: 10.1097/IJG.0000000000002015. Epub 2022 Mar 18.

A Novel Context Aware Joint Segmentation and Classification Framework for Glaucoma Detection.

Comput Math Methods Med. 2021 Nov 5;2021:2921737. doi: 10.1155/2021/2921737. eCollection 2021.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估多模态ChatGPT-4在使用眼底图像检测青光眼方面的优势和局限性。

Evaluating the strengths and limitations of multimodal ChatGPT-4 in detecting glaucoma using fundus images.

作者信息

机构信息

出版信息

OVERVIEW

METHODS

RESULTS

CONCLUSION

概述

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献