还在只用 ChatGPT？比较五种不同的人工智能聊天机器人对肾结石常见问题的回答。

Still Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots' Answers to the Most Common Questions About Kidney Stones.

机构信息

Faculty of Medicine Department of Urology, Tekirdağ Namık Kemal University, Tekirdag, Turkey.

Department of Urology, Bursa State Hospital, Nilufer, Turkey.

出版信息

J Endourol. 2024 Nov;38(11):1172-1177. doi: 10.1089/end.2024.0474. Epub 2024 Sep 6.

DOI:10.1089/end.2024.0474

PMID:39212674

Abstract

To evaluate and compare the quality and comprehensibility of answers produced by five distinct artificial intelligence (AI) chatbots-GPT-4, Claude, Mistral, Google PaLM, and Grok-in response to the most frequently searched questions about kidney stones (KS). Google Trends facilitated the identification of pertinent terms related to KS. Each AI chatbot was provided with a unique sequence of 25 commonly searched phrases as input. The responses were assessed using DISCERN, the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P), the Flesch-Kincaid Grade Level (FKGL), and the Flesch-Kincaid Reading Ease (FKRE) criteria. The three most frequently searched terms were "stone in kidney," "kidney stone pain," and "kidney pain." Nepal, India, and Trinidad and Tobago were the countries that performed the most searches in KS. None of the AI chatbots attained the requisite level of comprehensibility. Grok demonstrated the highest FKRE (55.6 ± 7.1) and lowest FKGL (10.0 ± 1.1) ratings ( = 0.001), whereas Claude outperformed the other chatbots in its DISCERN scores (47.6 ± 1.2) ( = 0.001). PEMAT-P understandability was the lowest in GPT-4 (53.2 ± 2.0), and actionability was the highest in Claude (61.8 ± 3.5) ( = 0.001). GPT-4 had the most complex language structure of the five chatbots, making it the most difficult to read and comprehend, whereas Grok was the simplest. Claude had the best KS text quality. Chatbot technology can improve healthcare material and make it easier to grasp.

摘要

为了评估和比较五种不同的人工智能（AI）聊天机器人-GPT-4、Claude、Mistral、Google PaLM 和 Grok-对肾结石（KS）最常搜索问题的回答的质量和可理解性。Google Trends 帮助确定了与 KS 相关的相关术语。每个 AI 聊天机器人都收到了一组 25 个独特的常用搜索短语作为输入。使用 DISCERN、用于打印材料的患者教育材料评估工具（PEMAT-P）、Flesch-Kincaid 年级水平（FKGL）和 Flesch-Kincaid 阅读舒适度（FKRE）标准评估了回复。三个最常搜索的词是“肾结石”、“肾结石疼痛”和“肾痛”。尼泊尔、印度和特立尼达和多巴哥是对 KS 进行最多搜索的国家。没有一个 AI 聊天机器人达到了必要的可理解性水平。Grok 的 FKRE（55.6 ± 7.1）最高，FKGL（10.0 ± 1.1）最低（= 0.001），而 Claude 的 DISCERN 评分（47.6 ± 1.2）（= 0.001）优于其他聊天机器人。PEMAT-P 在 GPT-4 中的理解能力最低（53.2 ± 2.0），在 Claude 中的可操作性最高（61.8 ± 3.5）（= 0.001）。GPT-4 是五个聊天机器人中语言结构最复杂的，阅读和理解起来最困难，而 Grok 是最简单的。Claude 具有最佳的 KS 文本质量。聊天机器人技术可以改善医疗保健材料，使其更容易理解。

相似文献

Still Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots' Answers to the Most Common Questions About Kidney Stones.还在只用 ChatGPT？比较五种不同的人工智能聊天机器人对肾结石常见问题的回答。

J Endourol. 2024 Nov;38(11):1172-1177. doi: 10.1089/end.2024.0474. Epub 2024 Sep 6.

Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性：公众需谨慎。

Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.

Assessment of Artificial Intelligence Chatbot Responses to Top Searched Queries About Cancer.评估人工智能聊天机器人对癌症热门搜索查询的响应

JAMA Oncol. 2023 Oct 1;9(10):1437-1440. doi: 10.1001/jamaoncol.2023.2947.

Performance of Artificial Intelligence Chatbots in Responding to Patient Queries Related to Traumatic Dental Injuries: A Comparative Study.人工智能聊天机器人在回应与创伤性牙损伤相关的患者咨询中的表现：一项比较研究。

Dent Traumatol. 2025 Jun;41(3):338-347. doi: 10.1111/edt.13020. Epub 2024 Nov 22.

Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.评估人工智能聊天机器人提供的关于化疗心脏毒性的患者教育材料的质量和可读性：一项观察性横断面研究。

Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.

How Useful are Current Chatbots Regarding Urology Patient Information? Comparison of the Ten Most Popular Chatbots' Responses About Female Urinary Incontinence.当前的聊天机器人在泌尿外科患者信息方面有多有用？对十种最受欢迎的聊天机器人关于女性尿失禁的回答进行比较。

J Med Syst. 2024 Nov 13;48(1):102. doi: 10.1007/s10916-024-02125-4.

Responses of Five Different Artificial Intelligence Chatbots to the Top Searched Queries About Erectile Dysfunction: A Comparative Analysis.五种不同人工智能聊天机器人对阳痿热搜查询的反应：比较分析。

J Med Syst. 2024 Apr 3;48(1):38. doi: 10.1007/s10916-024-02056-0.

Evaluating the Quality and Readability of Generative Artificial Intelligence (AI) Chatbot Responses in the Management of Achilles Tendon Rupture.评估生成式人工智能（AI）聊天机器人在跟腱断裂管理中的回复质量和可读性。

Cureus. 2025 Jan 31;17(1):e78313. doi: 10.7759/cureus.78313. eCollection 2025 Jan.

Evaluating AI Chatbot Responses to Postkidney Transplant Inquiries.评估人工智能聊天机器人对肾移植术后咨询的回复。

Transplant Proc. 2025 Mar;57(2):394-405. doi: 10.1016/j.transproceed.2024.12.028. Epub 2025 Jan 14.

Quality of Information About Kidney Stones from Artificial Intelligence Chatbots.人工智能聊天机器人中有关肾结石信息的质量。

J Endourol. 2024 Oct;38(10):1056-1061. doi: 10.1089/end.2023.0484. Epub 2024 Jul 29.

引用本文的文献

Use of Artificial Intelligence Methods for Improved Diagnosis of Urinary Tract Infections and Urinary Stone Disease.使用人工智能方法改善尿路感染和尿路结石病的诊断

J Clin Med. 2025 Jul 12;14(14):4942. doi: 10.3390/jcm14144942.

A structured evaluation of LLM-generated step-by-step instructions in cadaveric brachial plexus dissection.对大语言模型生成的尸体臂丛神经解剖分步指导的结构化评估。

BMC Med Educ. 2025 Jul 1;25(1):903. doi: 10.1186/s12909-025-07493-0.

What is the role of large language models in the management of urolithiasis?: a review.大语言模型在尿石症管理中的作用是什么？：一项综述。

Urolithiasis. 2025 May 15;53(1):92. doi: 10.1007/s00240-025-01761-w.

Chatgpt vs traditional pedagogy: a comparative study in urological learning.Chatgpt与传统教学法：泌尿外科学习中的比较研究

World J Urol. 2025 May 8;43(1):286. doi: 10.1007/s00345-025-05654-w.

Patient-facing chatbots: Enhancing healthcare accessibility while navigating digital literacy challenges and isolation risks-a mixed-methods study.面向患者的聊天机器人：在应对数字素养挑战和隔离风险的同时提高医疗保健可及性——一项混合方法研究

Digit Health. 2025 Apr 28;11:20552076251337321. doi: 10.1177/20552076251337321. eCollection 2025 Jan-Dec.

Young Adult Perspectives on Artificial Intelligence-Based Medication Counseling in China: Discrete Choice Experiment.中国年轻人对基于人工智能的药物咨询的看法：离散选择实验

J Med Internet Res. 2025 Apr 9;27:e67744. doi: 10.2196/67744.

Comment on: "Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3".关于《葡萄膜炎中大型语言模型性能的基准测试：ChatGPT-3.5、ChatGPT-4.0、谷歌Gemini和Anthropic Claude3的比较分析》的评论

Eye (Lond). 2025 May;39(7):1432. doi: 10.1038/s41433-025-03736-y. Epub 2025 Feb 26.

Artificial intelligence and patient education.人工智能与患者教育。

Curr Opin Urol. 2025 May 1;35(3):219-223. doi: 10.1097/MOU.0000000000001267. Epub 2025 Feb 12.

Opportunities and Challenges of Chatbots in Ophthalmology: A Narrative Review.眼科领域中聊天机器人的机遇与挑战：一篇叙述性综述

J Pers Med. 2024 Dec 21;14(12):1165. doi: 10.3390/jpm14121165.

Assessing the Quality of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的质量：一项观察性横断面研究。

Cureus. 2024 Sep 23;16(9):e69996. doi: 10.7759/cureus.69996. eCollection 2024 Sep.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

还在只用 ChatGPT？比较五种不同的人工智能聊天机器人对肾结石常见问题的回答。

Still Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots' Answers to the Most Common Questions About Kidney Stones.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献