Carlà Matteo Mario, Crincoli Emanuele, Rizzo Stanislao
Ophthalmology Department, "Fondazione Policlinico Universitario A. Gemelli, IRCCS", Rome, Italy.
Ophthalmology Department, Catholic University "Sacro Cuore", Rome, Italy ; and.
Retina. 2025 Apr 1;45(4):694-702. doi: 10.1097/IAE.0000000000004351.
To assess the diagnostic capabilities of the most recent chatbots releases, GPT-4o and Gemini Advanced, facing different retinal diseases.
Exploratory analysis on 50 cases with different surgical (n = 27) and medical (n = 23) retinal pathologies, whose optical coherence tomography/angiography scans were dragged into ChatGPT and Gemini's interfaces. Then, the authors asked "Please describe this image" and classified the diagnosis as: 1) Correct; 2) Partially correct; 3) Wrong; 4) Unable to assess exam type; and 5) Diagnosis not given.
ChatGPT indicated the correct diagnosis in 31 of 50 cases (62%), significantly higher than Gemini Advanced in 16 of 50 cases ( P = 0.0048). In 24% of cases, Gemini Advanced was not able to produce any answer, stating "That's not something I'm able to do yet." For both, primary misdiagnosis was macular edema, given erroneously in 16% and 14% of cases, respectively. ChatGPT-4o showed higher rates of correct diagnoses either in surgical (52% vs. 30%) or in medical retina (78% vs. 43%). Notably, when presented without the corresponding structural image, in any case Gemini was able to recognize optical coherence tomography angiography scans, confusing images for artworks.
ChatGPT-4o outperformed Gemini Advanced in diagnostic accuracy facing optical coherence tomography/angiography images, even if the range of diagnoses is still limited.
评估最新发布的聊天机器人GPT-4o和Gemini Advanced针对不同视网膜疾病的诊断能力。
对50例患有不同手术(n = 27)和内科(n = 23)视网膜病变的病例进行探索性分析,将其光学相干断层扫描/血管造影扫描图像导入ChatGPT和Gemini的界面。然后,作者询问“请描述这张图像”,并将诊断结果分类为:1)正确;2)部分正确;3)错误;4)无法评估检查类型;5)未给出诊断。
ChatGPT在50例中有31例(62%)给出了正确诊断,显著高于Gemini Advanced的50例中的16例(P = 0.0048)。在24%的病例中,Gemini Advanced无法给出任何答案,称“这不是我目前能做的事情”。对于两者而言,主要误诊均为黄斑水肿,分别在16%和14%的病例中错误给出。ChatGPT-4o在手术视网膜疾病(52%对30%)或内科视网膜疾病(78%对43%)中均显示出更高的正确诊断率。值得注意的是,当不呈现相应的结构图像时,Gemini在任何情况下都能识别光学相干断层扫描血管造影扫描图像,将图像误认作艺术品。
在面对光学相干断层扫描/血管造影图像时,ChatGPT-4o在诊断准确性方面优于Gemini Advanced,即便诊断范围仍然有限。