评估 ChatGPT®、BARD®、 Gemini®、Copilot®、Perplexity® 在姑息治疗方面的可读性、可靠性和质量。

Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.

机构信息

Clinic of Anesthesiology and Critical Care, Sincan Education and Research Hospital, Ankara, Turkey.

Clinic of Internal Medicine and Critical Care, Dr. Ismail Fehmi Cumalioğlu City Hospital, Tekirdağ, Turkey.

出版信息

Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.

DOI:10.1097/MD.0000000000039305

PMID:39151545

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11332738/

Abstract

There is no study that comprehensively evaluates data on the readability and quality of "palliative care" information provided by artificial intelligence (AI) chatbots ChatGPT®, Bard®, Gemini®, Copilot®, Perplexity®. Our study is an observational and cross-sectional original research study. In our study, AI chatbots ChatGPT®, Bard®, Gemini®, Copilot®, and Perplexity® were asked to present the answers of the 100 questions most frequently asked by patients about palliative care. Responses from each 5 AI chatbots were analyzed separately. This study did not involve any human participants. Study results revealed significant differences between the readability assessments of responses from all 5 AI chatbots (P < .05). According to the results of our study, when different readability indexes were evaluated holistically, the readability of AI chatbot responses was evaluated as Bard®, Copilot®, Perplexity®, ChatGPT®, Gemini®, from easy to difficult (P < .05). In our study, the median readability indexes of each of the 5 AI chatbots Bard®, Copilot®, Perplexity®, ChatGPT®, Gemini® responses were compared to the "recommended" 6th grade reading level. According to the results of our study answers of all 5 AI chatbots were compared with the 6th grade reading level, statistically significant differences were observed in the all formulas (P < .001). The answers of all 5 artificial intelligence robots were determined to be at an educational level well above the 6th grade level. The modified DISCERN and Journal of American Medical Association scores was found to be the highest in Perplexity® (P < .001). Gemini® responses were found to have the highest Global Quality Scale score (P < .001). It is emphasized that patient education materials should have a readability level of 6th grade level. Of the 5 AI chatbots whose answers about palliative care were evaluated, Bard®, Copilot®, Perplexity®, ChatGPT®, Gemini®, their current answers were found to be well above the recommended levels in terms of readability of text content. Text content quality assessment scores are also low. Both the quality and readability of texts should be brought to appropriate recommended limits.

摘要

没有研究全面评估人工智能（AI）聊天机器人 ChatGPT®、Bard®、Gemini®、Copilot®、Perplexity®提供的“姑息治疗”信息的可读性和质量数据。我们的研究是一项观察性和横断面的原始研究。在我们的研究中，要求 AI 聊天机器人 ChatGPT®、Bard®、Gemini®、Copilot®和 Perplexity®呈现患者最常询问的 100 个姑息治疗问题的答案。对每个 5 个 AI 聊天机器人的回答分别进行了分析。这项研究不涉及任何人类参与者。研究结果显示，所有 5 个 AI 聊天机器人的回答可读性评估存在显著差异（P<.05）。根据我们的研究结果，当综合评估不同的可读性指标时，AI 聊天机器人回答的可读性从易到难被评估为 Bard®、Copilot®、Perplexity®、ChatGPT®、Gemini®（P<.05）。在我们的研究中，将每个 AI 聊天机器人 Bard®、Copilot®、Perplexity®、ChatGPT®、Gemini®的 5 个回答的中位数可读性指数与“推荐”的 6 年级阅读水平进行了比较。根据我们的研究结果，将所有 5 个 AI 聊天机器人的回答与 6 年级阅读水平进行了比较，所有公式均观察到统计学上的显著差异（P<.001）。所有 5 个人工智能机器人的回答被确定为高于 6 年级水平的教育水平。发现 Perplexity®的修改后的 DISCERN 和美国医学会杂志评分最高（P<.001）。发现 Gemini®的全球质量量表评分最高（P<.001）。强调患者教育材料的可读性水平应为 6 年级水平。在评估的 5 个 AI 聊天机器人中，Bard®、Copilot®、Perplexity®、ChatGPT®、Gemini®，他们关于姑息治疗的回答目前在文本内容的可读性方面被发现远高于推荐水平。文本内容质量评估得分也较低。文本的质量和可读性都应达到适当的推荐限制。

相似文献

Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.评估 ChatGPT®、BARD®、 Gemini®、Copilot®、Perplexity® 在姑息治疗方面的可读性、可靠性和质量。

Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.

How artificial intelligence can provide information about subdural hematoma: Assessment of readability, reliability, and quality of ChatGPT, BARD, and perplexity responses.人工智能如何提供关于硬膜下血肿的信息：对ChatGPT、BARD和Perplexity回答的可读性、可靠性和质量评估。

Medicine (Baltimore). 2024 May 3;103(18):e38009. doi: 10.1097/MD.0000000000038009.

Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study.评估人工智能聊天机器人对心肺复苏术 100 个最常见查询的回答的易读性、可靠性和质量：一项观察性研究。

Medicine (Baltimore). 2024 May 31;103(22):e38352. doi: 10.1097/MD.0000000000038352.

Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain.ChatGPT、Gemini和Perplexity针对最常见疼痛问题生成的回答的可读性、可靠性和质量。

Medicine (Baltimore). 2025 Mar 14;104(11):e41780. doi: 10.1097/MD.0000000000041780.

Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and Perplexity regarding most frequently asked keywords about low back pain.评估ChatGPT、Gemini和Perplexity针对有关腰痛的最常见关键词所给出回答的可读性、质量和可靠性。

PeerJ. 2025 Jan 22;13:e18847. doi: 10.7717/peerj.18847. eCollection 2025.

Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.评估人工智能聊天机器人提供的关于化疗心脏毒性的患者教育材料的质量和可读性：一项观察性横断面研究。

Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.

AI Chatbots as Sources of STD Information: A Study on Reliability and Readability.作为性传播疾病信息来源的人工智能聊天机器人：可靠性与可读性研究

J Med Syst. 2025 Apr 3;49(1):43. doi: 10.1007/s10916-025-02178-z.

Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的可读性：一项观察性横断面研究。

Cureus. 2024 Jul 4;16(7):e63865. doi: 10.7759/cureus.63865. eCollection 2024 Jul.

Can artificial intelligence models serve as patient information consultants in orthodontics?人工智能模型能否在正畸学中充当患者信息顾问？

BMC Med Inform Decis Mak. 2024 Jul 29;24(1):211. doi: 10.1186/s12911-024-02619-8.

A Performance Evaluation of Large Language Models in Keratoconus: A Comparative Study of ChatGPT-3.5, ChatGPT-4.0, Gemini, Copilot, Chatsonic, and Perplexity.大语言模型在圆锥角膜中的性能评估：ChatGPT-3.5、ChatGPT-4.0、Gemini、Copilot、Chatsonic和Perplexity的比较研究

J Clin Med. 2024 Oct 30;13(21):6512. doi: 10.3390/jcm13216512.

引用本文的文献

Comparison of the readability of ChatGPT and Bard in medical communication: a meta-analysis.ChatGPT与Bard在医学交流中的可读性比较：一项荟萃分析。

BMC Med Inform Decis Mak. 2025 Sep 1;25(1):325. doi: 10.1186/s12911-025-03035-2.

Evaluating artificial intelligence chatbots' responses to gynecomastia inquiries: Comparative study of information quality, readability, and guideline consistency.评估人工智能聊天机器人对男性乳房发育症咨询的回复：信息质量、可读性和指南一致性的比较研究

Digit Health. 2025 Aug 26;11:20552076251367645. doi: 10.1177/20552076251367645. eCollection 2025 Jan-Dec.

To Self-Treat or Not to Self-Treat: Evaluating the Diagnostic, Advisory and Referral Effectiveness of ChatGPT Responses to the Most Common Musculoskeletal Disorders.自我治疗还是不自我治疗：评估ChatGPT对最常见肌肉骨骼疾病的诊断、咨询及转诊建议的有效性

Diagnostics (Basel). 2025 Jul 21;15(14):1834. doi: 10.3390/diagnostics15141834.

Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.评估ChatGPT、Gemini和Perplexity针对强直性脊柱炎最常见问题生成的回答的可读性、质量和可靠性。

PLoS One. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351. eCollection 2025.

Large language models' capabilities in responding to tuberculosis medical questions: testing ChatGPT, Gemini, and Copilot.大型语言模型在回答结核病医学问题方面的能力：对ChatGPT、Gemini和Copilot进行测试

Sci Rep. 2025 May 23;15(1):18004. doi: 10.1038/s41598-025-03074-9.

ChatGPT and Gemini for Patient Education: A Comparative Analysis of Common Pediatric Exanthematous Conditions.用于患者教育的ChatGPT和Gemini：常见小儿发疹性疾病的比较分析

Cureus. 2025 Apr 21;17(4):e82705. doi: 10.7759/cureus.82705. eCollection 2025 Apr.

Evaluation of ChatGPT Responses About Sexual Activity After Total Hip Arthroplasty: A Comparative Study with Observers of Different Experience Levels.评估ChatGPT对全髋关节置换术后性活动的回答：与不同经验水平观察者的对比研究。

J Clin Med. 2025 Apr 24;14(9):2942. doi: 10.3390/jcm14092942.

Comparison of performance of artificial intelligence tools in answering emergency medicine question pool: ChatGPT 4.0, Google Gemini and Microsoft Copilot.人工智能工具在回答急诊医学题库问题方面的性能比较：ChatGPT 4.0、谷歌Gemini和微软Copilot

Pak J Med Sci. 2025 Apr;41(4):968-972. doi: 10.12669/pjms.41.4.11178.

Evaluating the Use of Generative Artificial Intelligence to Support Genetic Counseling for Rare Diseases.评估生成式人工智能在支持罕见病遗传咨询中的应用。

Diagnostics (Basel). 2025 Mar 10;15(6):672. doi: 10.3390/diagnostics15060672.

Effectiveness of Generative Artificial Intelligence-Driven Responses to Patient Concerns in Long-Term Opioid Therapy: Cross-Model Assessment.生成式人工智能驱动的对长期阿片类药物治疗中患者担忧的回应的有效性：跨模型评估

Biomedicines. 2025 Mar 5;13(3):636. doi: 10.3390/biomedicines13030636.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验