Sensoy Eyupcan, Citirik Mehmet
Department of Ophthalmology, Ankara Etlik City Hospital, Ankara, Turkey.
Taiwan J Ophthalmol. 2024 Sep 13;14(3):409-413. doi: 10.4103/tjo.TJO-D-23-00166. eCollection 2024 Jul-Sep.
The purpose of the study was to evaluate the knowledge level of the Chat Generative Pretrained Transformer (ChatGPT), Bard, and Bing artificial intelligence (AI) chatbots regarding ocular inflammation, uveal diseases, and treatment modalities, and to investigate their relative performance compared to one another.
Thirty-six questions related to ocular inflammation, uveal diseases, and treatment modalities were posed to the ChatGPT, Bard, and Bing AI chatbots, and both correct and incorrect responses were recorded. The accuracy rates were compared using the Chi-squared test.
The ChatGPT provided correct answers to 52.8% of the questions, while Bard answered 38.9% correctly, and Bing answered 44.4% correctly. All three AI programs provided identical responses to 20 (55.6%) of the questions, with 45% of these responses being correct and 55% incorrect. No significant difference was observed between the correct and incorrect responses from the three AI chatbots ( = 0.654).
AI chatbots should be developed to provide widespread access to accurate information about ocular inflammation, uveal diseases, and treatment modalities. Future research could explore ways to enhance the performance of these chatbots.
本研究旨在评估聊天生成预训练变换器(ChatGPT)、巴德(Bard)和必应(Bing)人工智能(AI)聊天机器人关于眼部炎症、葡萄膜疾病及治疗方式的知识水平,并调查它们彼此之间的相对性能。
向ChatGPT、巴德和必应AI聊天机器人提出36个与眼部炎症、葡萄膜疾病及治疗方式相关的问题,记录正确和错误的回答。使用卡方检验比较准确率。
ChatGPT对52.8%的问题给出了正确答案,而巴德的正确回答率为38.9%,必应的正确回答率为44.4%。三个AI程序对20个(55.6%)问题给出了相同回答,其中45%的回答正确,55%错误。三个AI聊天机器人的正确和错误回答之间未观察到显著差异( = 0.654)。
应开发AI聊天机器人,以便广泛获取有关眼部炎症、葡萄膜疾病及治疗方式的准确信息。未来的研究可以探索提高这些聊天机器人性能的方法。