• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过GPT-4V眼底图像分析进行青光眼检测与特征识别

Glaucoma Detection and Feature Identification via GPT-4V Fundus Image Analysis.

作者信息

Jalili Jalil, Jiravarnsirikul Anuwat, Bowd Christopher, Chuter Benton, Belghith Akram, Goldbaum Michael H, Baxter Sally L, Weinreb Robert N, Zangwill Linda M, Christopher Mark

机构信息

Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology, Shiley Eye Institute, University of California, San Diego, La Jolla, California.

Hamilton Glaucoma Center, Viterbi Family Department of Ophthalmology, Shiley Eye Institute, University of California, San Diego, La Jolla, California.

出版信息

Ophthalmol Sci. 2024 Nov 29;5(2):100667. doi: 10.1016/j.xops.2024.100667. eCollection 2025 Mar-Apr.

DOI:10.1016/j.xops.2024.100667
PMID:39877464
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11773068/
Abstract

PURPOSE

The aim is to assess GPT-4V's (OpenAI) diagnostic accuracy and its capability to identify glaucoma-related features compared to expert evaluations.

DESIGN

Evaluation of multimodal large language models for reviewing fundus images in glaucoma.

SUBJECTS

A total of 300 fundus images from 3 public datasets (ACRIMA, ORIGA, and RIM-One v3) that included 139 glaucomatous and 161 nonglaucomatous cases were analyzed.

METHODS

Preprocessing ensured each image was centered on the optic disc. GPT-4's vision-preview model (GPT-4V) assessed each image for various glaucoma-related criteria: image quality, image gradability, cup-to-disc ratio, peripapillary atrophy, disc hemorrhages, rim thinning (by quadrant and clock hour), glaucoma status, and estimated probability of glaucoma. Each image was analyzed twice by GPT-4V to evaluate consistency in its predictions. Two expert graders independently evaluated the same images using identical criteria. Comparisons between GPT-4V's assessments, expert evaluations, and dataset labels were made to determine accuracy, sensitivity, specificity, and Cohen kappa.

MAIN OUTCOME MEASURES

The main parameters measured were the accuracy, sensitivity, specificity, and Cohen kappa of GPT-4V in detecting glaucoma compared with expert evaluations.

RESULTS

GPT-4V successfully provided glaucoma assessments for all 300 fundus images across the datasets, although approximately 35% required multiple prompt submissions. GPT-4V's overall accuracy in glaucoma detection was slightly lower (0.68, 0.70, and 0.81, respectively) than that of expert graders (0.78, 0.80, and 0.88, for expert grader 1 and 0.72, 0.78, and 0.87, for expert grader 2, respectively), across the ACRIMA, ORIGA, and RIM-ONE datasets. In Glaucoma detection, GPT-4V showed variable agreement by dataset and expert graders, with Cohen kappa values ranging from 0.08 to 0.72. In terms of feature detection, GPT-4V demonstrated high consistency (repeatability) in image gradability, with an agreement accuracy of ≥89% and substantial agreement in rim thinning and cup-to-disc ratio assessments, although kappas were generally lower than expert-to-expert agreement.

CONCLUSIONS

GPT-4V shows promise as a tool in glaucoma screening and detection through fundus image analysis, demonstrating generally high agreement with expert evaluations of key diagnostic features, although agreement did vary substantially across datasets.

FINANCIAL DISCLOSURES

Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

摘要

目的

旨在评估GPT-4V(OpenAI)的诊断准确性及其与专家评估相比识别青光眼相关特征的能力。

设计

评估用于青光眼眼底图像审查的多模态大语言模型。

研究对象

分析了来自3个公共数据集(ACRIMA、ORIGA和RIM-One v3)的总共300张眼底图像,其中包括139例青光眼病例和161例非青光眼病例。

方法

预处理确保每张图像以视盘为中心。GPT-4的视觉预览模型(GPT-4V)根据各种青光眼相关标准评估每张图像:图像质量、图像可分级性、杯盘比、视乳头周围萎缩、视盘出血、边缘变薄(按象限和钟点)、青光眼状态以及青光眼的估计概率。GPT-4V对每张图像进行了两次分析,以评估其预测的一致性。两名专家评分员使用相同的标准独立评估相同的图像。对GPT-4V的评估、专家评估和数据集标签进行比较,以确定准确性、敏感性、特异性和科恩kappa系数。

主要观察指标

所测量的主要参数是GPT-4V与专家评估相比检测青光眼的准确性、敏感性、特异性和科恩kappa系数。

结果

GPT-4V成功地为数据集中所有300张眼底图像提供了青光眼评估,尽管约35%的图像需要多次提交提示。在ACRIMA、ORIGA和RIM-ONE数据集中,GPT-4V检测青光眼的总体准确性略低于专家评分员(专家评分员1分别为0.78、0.80和0.88,专家评分员2分别为0.72、0.78和0.87)(分别为0.68) 、0.70和0.81)。在青光眼检测中,GPT-4V显示出数据集和专家评分员之间的一致性存在差异,科恩kappa值范围为0.08至0.72。在特征检测方面,GPT-4V在图像可分级性方面表现出高度一致性(重复性),一致性准确率≥89%,在边缘变薄和杯盘比评估方面有实质性一致性,尽管kappa系数通常低于专家之间的一致性。

结论

GPT-4V作为一种通过眼底图像分析进行青光眼筛查和检测的工具显示出前景,与专家对关键诊断特征的评估总体上高度一致,尽管不同数据集之间一致性差异很大。

财务披露

在本文末尾的脚注和披露中可能会找到专有或商业披露信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65ed/11773068/9af9e0be4591/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65ed/11773068/5329ff9f1870/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65ed/11773068/1040c539cf5f/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65ed/11773068/919ad4b408e3/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65ed/11773068/9af9e0be4591/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65ed/11773068/5329ff9f1870/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65ed/11773068/1040c539cf5f/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65ed/11773068/919ad4b408e3/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65ed/11773068/9af9e0be4591/gr4.jpg

相似文献

1
Glaucoma Detection and Feature Identification via GPT-4V Fundus Image Analysis.通过GPT-4V眼底图像分析进行青光眼检测与特征识别
Ophthalmol Sci. 2024 Nov 29;5(2):100667. doi: 10.1016/j.xops.2024.100667. eCollection 2025 Mar-Apr.
2
Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study.揭示GPT-4V在美国医师执照考试(USMLE)问题上高精度背后的隐藏挑战:观察性研究。
J Med Internet Res. 2025 Feb 7;27:e65146. doi: 10.2196/65146.
3
Assessing GPT-4 multimodal performance in radiological image analysis.评估GPT-4在放射图像分析中的多模态性能。
Eur Radiol. 2025 Apr;35(4):1959-1965. doi: 10.1007/s00330-024-11035-5. Epub 2024 Aug 30.
4
Toward Foundation Models in Radiology? Quantitative Assessment of GPT-4V's Multimodal and Multianatomic Region Capabilities.迈向放射学的基础模型?GPT-4V 的多模态和多原子区域能力的定量评估。
Radiology. 2024 Nov;313(2):e240955. doi: 10.1148/radiol.240955.
5
Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine.医学领域多模态GPT-4视觉专家级准确性背后的隐藏缺陷。
NPJ Digit Med. 2024 Jul 23;7(1):190. doi: 10.1038/s41746-024-01185-7.
6
Hidden Flaws Behind Expert-Level Accuracy of Multimodal GPT-4 Vision in Medicine.医学领域中多模态GPT-4视觉专家级准确性背后的隐藏缺陷。
ArXiv. 2024 Aug 31:arXiv:2401.08396v4.
7
Characteristics of a Large, Labeled Data Set for the Training of Artificial Intelligence for Glaucoma Screening with Fundus Photographs.用于通过眼底照片训练青光眼筛查人工智能的大型标记数据集的特征
Ophthalmol Sci. 2023 Mar 17;3(3):100300. doi: 10.1016/j.xops.2023.100300. eCollection 2023 Sep.
8
Evaluating the efficacy of few-shot learning for GPT-4Vision in neurodegenerative disease histopathology: A comparative analysis with convolutional neural network model.评估 GPT-4Vision 在神经退行性疾病组织病理学中少样本学习的效果:与卷积神经网络模型的比较分析。
Neuropathol Appl Neurobiol. 2024 Aug;50(4):e12997. doi: 10.1111/nan.12997.
9
Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration.评估ChatGPT-4的诊断准确性:视觉数据整合的影响。
JMIR Med Inform. 2024 Apr 9;12:e55627. doi: 10.2196/55627.
10
Highly Accurate and Precise Automated Cup-to-Disc Ratio Quantification for Glaucoma Screening.用于青光眼筛查的高度准确且精确的杯盘比自动定量分析
Ophthalmol Sci. 2024 Apr 27;4(5):100540. doi: 10.1016/j.xops.2024.100540. eCollection 2024 Sep-Oct.

本文引用的文献

1
Evaluating the strengths and limitations of multimodal ChatGPT-4 in detecting glaucoma using fundus images.评估多模态ChatGPT-4在使用眼底图像检测青光眼方面的优势和局限性。
Front Ophthalmol (Lausanne). 2024 Jun 7;4:1387190. doi: 10.3389/fopht.2024.1387190. eCollection 2024.
2
Evaluating GPT-V4 (GPT-4 with Vision) on Detection of Radiologic Findings on Chest Radiographs.评估 GPT-V4(具有视觉功能的 GPT-4)在检测胸部 X 光片中放射学发现的能力。
Radiology. 2024 May;311(2):e233270. doi: 10.1148/radiol.233270.
3
Assessment of a Large Language Model's Responses to Questions and Cases About Glaucoma and Retina Management.
评估大型语言模型对青光眼和视网膜管理相关问题和病例的回答。
JAMA Ophthalmol. 2024 Apr 1;142(4):371-375. doi: 10.1001/jamaophthalmol.2023.6917.
4
Diagnostic capabilities of ChatGPT in ophthalmology.ChatGPT 在眼科诊断中的应用能力。
Graefes Arch Clin Exp Ophthalmol. 2024 Jul;262(7):2345-2352. doi: 10.1007/s00417-023-06363-z. Epub 2024 Jan 6.
5
The Use of ChatGPT to Assist in Diagnosing Glaucoma Based on Clinical Case Reports.基于临床病例报告使用ChatGPT辅助诊断青光眼
Ophthalmol Ther. 2023 Dec;12(6):3121-3132. doi: 10.1007/s40123-023-00805-x. Epub 2023 Sep 14.
6
What can GPT-4 do for Diagnosing Rare Eye Diseases? A Pilot Study.GPT-4在罕见眼病诊断中能发挥什么作用?一项初步研究。
Ophthalmol Ther. 2023 Dec;12(6):3395-3402. doi: 10.1007/s40123-023-00789-8. Epub 2023 Sep 1.
7
Performance of Generative Large Language Models on Ophthalmology Board-Style Questions.生成式大型语言模型在眼科 Board 式问题中的表现。
Am J Ophthalmol. 2023 Oct;254:141-149. doi: 10.1016/j.ajo.2023.05.024. Epub 2023 Jun 18.
8
Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of Its Successes and Shortcomings.评估ChatGPT在眼科领域的表现:对其优缺点的分析。
Ophthalmol Sci. 2023 May 5;3(4):100324. doi: 10.1016/j.xops.2023.100324. eCollection 2023 Dec.
9
A generalizable deep learning regression model for automated glaucoma screening from fundus images.一种可推广的用于从眼底图像自动进行青光眼筛查的深度学习回归模型。
NPJ Digit Med. 2023 Jun 13;6(1):112. doi: 10.1038/s41746-023-00857-0.
10
Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination.GPT-3.5、GPT-4与人类用户在眼科笔试模拟考试中的表现比较。
Eye (Lond). 2023 Dec;37(17):3694-3695. doi: 10.1038/s41433-023-02564-2. Epub 2023 May 8.