由ChatGPT-4o和Gemini Advanced进行的视网膜成像分析：革命的转折点？

RETINAL IMAGING ANALYSIS PERFORMED BY CHATGPT-4o AND GEMINI ADVANCED: The Turning Point of the Revolution?

作者信息

Carlà Matteo Mario, Crincoli Emanuele, Rizzo Stanislao

机构信息

Ophthalmology Department, "Fondazione Policlinico Universitario A. Gemelli, IRCCS", Rome, Italy.

Ophthalmology Department, Catholic University "Sacro Cuore", Rome, Italy ; and.

出版信息

Retina. 2025 Apr 1;45(4):694-702. doi: 10.1097/IAE.0000000000004351.

DOI:10.1097/IAE.0000000000004351

PMID:39715322

Abstract

PURPOSE

To assess the diagnostic capabilities of the most recent chatbots releases, GPT-4o and Gemini Advanced, facing different retinal diseases.

METHODS

Exploratory analysis on 50 cases with different surgical (n = 27) and medical (n = 23) retinal pathologies, whose optical coherence tomography/angiography scans were dragged into ChatGPT and Gemini's interfaces. Then, the authors asked "Please describe this image" and classified the diagnosis as: 1) Correct; 2) Partially correct; 3) Wrong; 4) Unable to assess exam type; and 5) Diagnosis not given.

RESULTS

ChatGPT indicated the correct diagnosis in 31 of 50 cases (62%), significantly higher than Gemini Advanced in 16 of 50 cases ( P = 0.0048). In 24% of cases, Gemini Advanced was not able to produce any answer, stating "That's not something I'm able to do yet." For both, primary misdiagnosis was macular edema, given erroneously in 16% and 14% of cases, respectively. ChatGPT-4o showed higher rates of correct diagnoses either in surgical (52% vs. 30%) or in medical retina (78% vs. 43%). Notably, when presented without the corresponding structural image, in any case Gemini was able to recognize optical coherence tomography angiography scans, confusing images for artworks.

CONCLUSION

ChatGPT-4o outperformed Gemini Advanced in diagnostic accuracy facing optical coherence tomography/angiography images, even if the range of diagnoses is still limited.

摘要

目的

评估最新发布的聊天机器人GPT-4o和Gemini Advanced针对不同视网膜疾病的诊断能力。

方法

对50例患有不同手术（n = 27）和内科（n = 23）视网膜病变的病例进行探索性分析，将其光学相干断层扫描/血管造影扫描图像导入ChatGPT和Gemini的界面。然后，作者询问“请描述这张图像”，并将诊断结果分类为：1）正确；2）部分正确；3）错误；4）无法评估检查类型；5）未给出诊断。

结果

ChatGPT在50例中有31例（62%）给出了正确诊断，显著高于Gemini Advanced的50例中的16例（P = 0.0048）。在24%的病例中，Gemini Advanced无法给出任何答案，称“这不是我目前能做的事情”。对于两者而言，主要误诊均为黄斑水肿，分别在16%和14%的病例中错误给出。ChatGPT-4o在手术视网膜疾病（52%对30%）或内科视网膜疾病（78%对43%）中均显示出更高的正确诊断率。值得注意的是，当不呈现相应的结构图像时，Gemini在任何情况下都能识别光学相干断层扫描血管造影扫描图像，将图像误认作艺术品。