Suppr超能文献

还在只用 ChatGPT?比较五种不同的人工智能聊天机器人对肾结石常见问题的回答。

Still Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots' Answers to the Most Common Questions About Kidney Stones.

机构信息

Faculty of Medicine Department of Urology, Tekirdağ Namık Kemal University, Tekirdag, Turkey.

Department of Urology, Bursa State Hospital, Nilufer, Turkey.

出版信息

J Endourol. 2024 Nov;38(11):1172-1177. doi: 10.1089/end.2024.0474. Epub 2024 Sep 6.

Abstract

To evaluate and compare the quality and comprehensibility of answers produced by five distinct artificial intelligence (AI) chatbots-GPT-4, Claude, Mistral, Google PaLM, and Grok-in response to the most frequently searched questions about kidney stones (KS). Google Trends facilitated the identification of pertinent terms related to KS. Each AI chatbot was provided with a unique sequence of 25 commonly searched phrases as input. The responses were assessed using DISCERN, the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P), the Flesch-Kincaid Grade Level (FKGL), and the Flesch-Kincaid Reading Ease (FKRE) criteria. The three most frequently searched terms were "stone in kidney," "kidney stone pain," and "kidney pain." Nepal, India, and Trinidad and Tobago were the countries that performed the most searches in KS. None of the AI chatbots attained the requisite level of comprehensibility. Grok demonstrated the highest FKRE (55.6 ± 7.1) and lowest FKGL (10.0 ± 1.1) ratings ( = 0.001), whereas Claude outperformed the other chatbots in its DISCERN scores (47.6 ± 1.2) ( = 0.001). PEMAT-P understandability was the lowest in GPT-4 (53.2 ± 2.0), and actionability was the highest in Claude (61.8 ± 3.5) ( = 0.001). GPT-4 had the most complex language structure of the five chatbots, making it the most difficult to read and comprehend, whereas Grok was the simplest. Claude had the best KS text quality. Chatbot technology can improve healthcare material and make it easier to grasp.

摘要

为了评估和比较五种不同的人工智能(AI)聊天机器人-GPT-4、Claude、Mistral、Google PaLM 和 Grok-对肾结石(KS)最常搜索问题的回答的质量和可理解性。Google Trends 帮助确定了与 KS 相关的相关术语。每个 AI 聊天机器人都收到了一组 25 个独特的常用搜索短语作为输入。使用 DISCERN、用于打印材料的患者教育材料评估工具(PEMAT-P)、Flesch-Kincaid 年级水平(FKGL)和 Flesch-Kincaid 阅读舒适度(FKRE)标准评估了回复。三个最常搜索的词是“肾结石”、“肾结石疼痛”和“肾痛”。尼泊尔、印度和特立尼达和多巴哥是对 KS 进行最多搜索的国家。没有一个 AI 聊天机器人达到了必要的可理解性水平。Grok 的 FKRE(55.6 ± 7.1)最高,FKGL(10.0 ± 1.1)最低(= 0.001),而 Claude 的 DISCERN 评分(47.6 ± 1.2)(= 0.001)优于其他聊天机器人。PEMAT-P 在 GPT-4 中的理解能力最低(53.2 ± 2.0),在 Claude 中的可操作性最高(61.8 ± 3.5)(= 0.001)。GPT-4 是五个聊天机器人中语言结构最复杂的,阅读和理解起来最困难,而 Grok 是最简单的。Claude 具有最佳的 KS 文本质量。聊天机器人技术可以改善医疗保健材料,使其更容易理解。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验