眼科病例诊断方法中的人工智能与人类智能：性能与一致性的定性评估

Artificial Versus Human Intelligence in the Diagnostic Approach of Ophthalmic Case Scenarios: A Qualitative Evaluation of Performance and Consistency.

作者信息

Mandalos Achilleas, Tsouris Dimitrios

机构信息

Ophthalmology, General Hospital of Karditsa, Karditsa, GRC.

Ophthalmology, General University Hospital of Larissa, Larissa, GRC.

出版信息

Cureus. 2024 Jun 16;16(6):e62471. doi: 10.7759/cureus.62471. eCollection 2024 Jun.

DOI:10.7759/cureus.62471

PMID:39015855

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11251728/

Abstract

PURPOSE

To evaluate the efficiency of three artificial intelligence (AI) chatbots (ChatGPT-3.5 (OpenAI, San Francisco, California, United States), Bing Copilot (Microsoft Corporation, Redmond, Washington, United States), Google Gemini (Google LLC, Mountain View, California, United States)) in assisting the ophthalmologist in the diagnostic approach and management of challenging ophthalmic cases and compare their performance with that of a practicing human ophthalmic specialist. The secondary aim was to assess the short- and medium-term consistency of ChatGPT's responses.

METHODS

Eleven ophthalmic case scenarios of variable complexity were presented to the AI chatbots and to an ophthalmic specialist in a stepwise fashion. Advice regarding the initial differential diagnosis, the final diagnosis, further investigation, and management was asked for. One month later, the same process was repeated twice on the same day for ChatGPT only.

RESULTS

The individual diagnostic performance of all three AI chatbots was inferior to that of the ophthalmic specialist; however, they provided useful complementary input in the diagnostic algorithm. This was especially true for ChatGPT and Bing Copilot. ChatGPT exhibited reasonable short- and medium-term consistency, with the mean Jaccard similarity coefficient of responses varying between 0.58 and 0.76.

CONCLUSION

AI chatbots may act as useful assisting tools in the diagnosis and management of challenging ophthalmic cases; however, their responses should be scrutinized for potential inaccuracies, and by no means can they replace consultation with an ophthalmic specialist.

摘要

目的

评估三种人工智能（AI）聊天机器人（ChatGPT-3.5（美国加利福尼亚州旧金山的OpenAI公司）、必应副驾驶（美国华盛顿州雷德蒙德的微软公司）、谷歌Gemini（美国加利福尼亚州山景城的谷歌有限责任公司））在协助眼科医生诊断和管理具有挑战性的眼科病例方面的效率，并将它们的表现与执业眼科专科医生的表现进行比较。次要目的是评估ChatGPT回答的短期和中期一致性。