评估聊天机器人生成白内障手术宣传册的能力：必应人工智能、ChatGPT 3.5、ChatGPT 4、ChatSonic、谷歌巴德、Perplexity和Pi。

Assessing chatbots ability to produce leaflets on cataract surgery: Bing AI, chatGPT 3.5, chatGPT 4o, ChatSonic, Google Bard, Perplexity, and Pi.

作者信息

Thompson Polly, Thornton Richard, Ramsden Conor M

机构信息

From the The West of England Eye Unit, Royal Devon University Hospital NHS Foundation Trust, Exeter, United Kingdom (Thompson, Ramsden); Royal Eye Infirmary, Derriford Hospital, University Hospitals Plymouth NHS Foundation Trust, Plymouth, United Kingdom (Thornton); The University of Exeter, Exeter, United Kingdom (Ramsden).

出版信息

J Cataract Refract Surg. 2025 May 1;51(5):371-375. doi: 10.1097/j.jcrs.0000000000001622.

DOI:10.1097/j.jcrs.0000000000001622

PMID:39885649

Abstract

PURPOSE

To evaluate leaflets on cataract surgery produced by 7 common free chatbots.

SETTING

UK-based ophthalmologists carrying out online research.

DESIGN

Data were collected from the responses of 7 freely available online chatbots.

METHODS

Analysis of answers given by 7 chatbots (Bing AI, chatGPT 3.5, chatGPT 4o, ChatSonic, Google Bard, Perplexity, and Pi) was prompted to "make a patient information leaflet on cataract surgery." Answers were evaluated using the DISCERN instrument, Patient Education Materials Assessment Tool (PEMAT), presence of misinformation, the Flesch-Kincaid Grade level readability score, and material reliability.

RESULTS

The highest overall scored response was from ChatSonic, followed by Bing AI and then Perplexity. The lowest scoring was ChatGPT 3.5. ChatSonic achieved the highest DISCERN and PEMAT scores, and had the highest Flesch-Kincaid Grade level. The lowest DISCERN and PEMAT scores were for Pi. Only ChatGPT 3.5 included some misinformation in its response. Bing AI, ChatSonic, and Perplexity included reliable references; the other chatbots provided no references.

CONCLUSIONS

This study demonstrates a range of answers given by chatbots creating a cataract surgery leaflet, suggesting variation in their development and reliability. ChatGPT 3.5 scored the most poorly. However, ChatSonic indicated promise in how technology may be used to assist information giving in ophthalmology.

摘要

目的

评估7个常见免费聊天机器人生成的白内障手术宣传册。

背景

开展在线研究的英国眼科医生。

设计

从7个免费在线聊天机器人的回复中收集数据。

方法

促使7个聊天机器人（必应人工智能、ChatGPT 3.5、ChatGPT 4、ChatSonic、谷歌巴德、Perplexity和Pi）回答“制作一份关于白内障手术的患者信息宣传册”。使用DISCERN工具、患者教育材料评估工具（PEMAT）、错误信息的存在情况、弗莱什-金凯德年级水平可读性得分以及材料可靠性对回答进行评估。

结果

总体得分最高的回复来自ChatSonic，其次是必应人工智能，然后是Perplexity。得分最低的是ChatGPT 3.5。ChatSonic获得了最高的DISCERN和PEMAT分数，并且弗莱什-金凯德年级水平最高。Pi的DISCERN和PEMAT分数最低。只有ChatGPT 3.5在其回复中包含一些错误信息。必应人工智能、ChatSonic和Perplexity包含可靠的参考文献；其他聊天机器人未提供参考文献。

结论

本研究展示了聊天机器人在创建白内障手术宣传册时给出的一系列答案，表明它们在开发和可靠性方面存在差异。ChatGPT 3.5得分最差。然而，ChatSonic显示出利用技术辅助眼科信息提供的潜力。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

评估聊天机器人生成白内障手术宣传册的能力：必应人工智能、ChatGPT 3.5、ChatGPT 4、ChatSonic、谷歌巴德、Perplexity和Pi。

Assessing chatbots ability to produce leaflets on cataract surgery: Bing AI, chatGPT 3.5, chatGPT 4o, ChatSonic, Google Bard, Perplexity, and Pi.

作者信息

机构信息

出版信息

PURPOSE

SETTING

DESIGN

METHODS

RESULTS

CONCLUSIONS

目的

背景

设计

方法

结果

结论

相似文献

评估聊天机器人生成白内障手术宣传册的能力：必应人工智能、ChatGPT 3.5、ChatGPT 4、ChatSonic、谷歌巴德、Perplexity和Pi。

Assessing chatbots ability to produce leaflets on cataract surgery: Bing AI, chatGPT 3.5, chatGPT 4o, ChatSonic, Google Bard, Perplexity, and Pi.

作者信息

机构信息

出版信息

PURPOSE

SETTING

DESIGN

METHODS

RESULTS

CONCLUSIONS

目的

背景

设计

方法

结果

结论

相似文献