眼科病例诊断方法中的人工智能与人类智能:性能与一致性的定性评估

Artificial Versus Human Intelligence in the Diagnostic Approach of Ophthalmic Case Scenarios: A Qualitative Evaluation of Performance and Consistency.

作者信息

Mandalos Achilleas, Tsouris Dimitrios

机构信息

Ophthalmology, General Hospital of Karditsa, Karditsa, GRC.

Ophthalmology, General University Hospital of Larissa, Larissa, GRC.

出版信息

Cureus. 2024 Jun 16;16(6):e62471. doi: 10.7759/cureus.62471. eCollection 2024 Jun.

Abstract

PURPOSE

To evaluate the efficiency of three artificial intelligence (AI) chatbots (ChatGPT-3.5 (OpenAI, San Francisco, California, United States), Bing Copilot (Microsoft Corporation, Redmond, Washington, United States), Google Gemini (Google LLC, Mountain View, California, United States)) in assisting the ophthalmologist in the diagnostic approach and management of challenging ophthalmic cases and compare their performance with that of a practicing human ophthalmic specialist. The secondary aim was to assess the short- and medium-term consistency of ChatGPT's responses.

METHODS

Eleven ophthalmic case scenarios of variable complexity were presented to the AI chatbots and to an ophthalmic specialist in a stepwise fashion. Advice regarding the initial differential diagnosis, the final diagnosis, further investigation, and management was asked for. One month later, the same process was repeated twice on the same day for ChatGPT only.

RESULTS

The individual diagnostic performance of all three AI chatbots was inferior to that of the ophthalmic specialist; however, they provided useful complementary input in the diagnostic algorithm. This was especially true for ChatGPT and Bing Copilot. ChatGPT exhibited reasonable short- and medium-term consistency, with the mean Jaccard similarity coefficient of responses varying between 0.58 and 0.76.

CONCLUSION

AI chatbots may act as useful assisting tools in the diagnosis and management of challenging ophthalmic cases; however, their responses should be scrutinized for potential inaccuracies, and by no means can they replace consultation with an ophthalmic specialist.

摘要

目的

评估三种人工智能(AI)聊天机器人(ChatGPT-3.5(美国加利福尼亚州旧金山的OpenAI公司)、必应副驾驶(美国华盛顿州雷德蒙德的微软公司)、谷歌Gemini(美国加利福尼亚州山景城的谷歌有限责任公司))在协助眼科医生诊断和管理具有挑战性的眼科病例方面的效率,并将它们的表现与执业眼科专科医生的表现进行比较。次要目的是评估ChatGPT回答的短期和中期一致性。

方法

以逐步方式向AI聊天机器人和一位眼科专家呈现11个复杂度不同的眼科病例场景。询问有关初始鉴别诊断、最终诊断、进一步检查和管理的建议。一个月后,仅对ChatGPT在同一天重复相同过程两次。

结果

所有三种AI聊天机器人的个体诊断表现均不如眼科专家;然而,它们在诊断算法中提供了有用的补充信息。ChatGPT和必应副驾驶尤其如此。ChatGPT表现出合理的短期和中期一致性,回答的平均杰卡德相似系数在0.58至0.76之间变化。

结论

AI聊天机器人在具有挑战性的眼科病例的诊断和管理中可能是有用的辅助工具;然而,应仔细审查它们的回答是否存在潜在不准确之处,而且它们绝不能取代与眼科专家的会诊。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索