Suppr超能文献

评估 ChatGPT®、BARD®、 Gemini®、Copilot®、Perplexity® 在姑息治疗方面的可读性、可靠性和质量。

Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.

机构信息

Clinic of Anesthesiology and Critical Care, Sincan Education and Research Hospital, Ankara, Turkey.

Clinic of Internal Medicine and Critical Care, Dr. Ismail Fehmi Cumalioğlu City Hospital, Tekirdağ, Turkey.

出版信息

Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.

Abstract

There is no study that comprehensively evaluates data on the readability and quality of "palliative care" information provided by artificial intelligence (AI) chatbots ChatGPT®, Bard®, Gemini®, Copilot®, Perplexity®. Our study is an observational and cross-sectional original research study. In our study, AI chatbots ChatGPT®, Bard®, Gemini®, Copilot®, and Perplexity® were asked to present the answers of the 100 questions most frequently asked by patients about palliative care. Responses from each 5 AI chatbots were analyzed separately. This study did not involve any human participants. Study results revealed significant differences between the readability assessments of responses from all 5 AI chatbots (P < .05). According to the results of our study, when different readability indexes were evaluated holistically, the readability of AI chatbot responses was evaluated as Bard®, Copilot®, Perplexity®, ChatGPT®, Gemini®, from easy to difficult (P < .05). In our study, the median readability indexes of each of the 5 AI chatbots Bard®, Copilot®, Perplexity®, ChatGPT®, Gemini® responses were compared to the "recommended" 6th grade reading level. According to the results of our study answers of all 5 AI chatbots were compared with the 6th grade reading level, statistically significant differences were observed in the all formulas (P < .001). The answers of all 5 artificial intelligence robots were determined to be at an educational level well above the 6th grade level. The modified DISCERN and Journal of American Medical Association scores was found to be the highest in Perplexity® (P < .001). Gemini® responses were found to have the highest Global Quality Scale score (P < .001). It is emphasized that patient education materials should have a readability level of 6th grade level. Of the 5 AI chatbots whose answers about palliative care were evaluated, Bard®, Copilot®, Perplexity®, ChatGPT®, Gemini®, their current answers were found to be well above the recommended levels in terms of readability of text content. Text content quality assessment scores are also low. Both the quality and readability of texts should be brought to appropriate recommended limits.

摘要

没有研究全面评估人工智能(AI)聊天机器人 ChatGPT®、Bard®、Gemini®、Copilot®、Perplexity®提供的“姑息治疗”信息的可读性和质量数据。我们的研究是一项观察性和横断面的原始研究。在我们的研究中,要求 AI 聊天机器人 ChatGPT®、Bard®、Gemini®、Copilot®和 Perplexity®呈现患者最常询问的 100 个姑息治疗问题的答案。对每个 5 个 AI 聊天机器人的回答分别进行了分析。这项研究不涉及任何人类参与者。研究结果显示,所有 5 个 AI 聊天机器人的回答可读性评估存在显著差异(P<.05)。根据我们的研究结果,当综合评估不同的可读性指标时,AI 聊天机器人回答的可读性从易到难被评估为 Bard®、Copilot®、Perplexity®、ChatGPT®、Gemini®(P<.05)。在我们的研究中,将每个 AI 聊天机器人 Bard®、Copilot®、Perplexity®、ChatGPT®、Gemini®的 5 个回答的中位数可读性指数与“推荐”的 6 年级阅读水平进行了比较。根据我们的研究结果,将所有 5 个 AI 聊天机器人的回答与 6 年级阅读水平进行了比较,所有公式均观察到统计学上的显著差异(P<.001)。所有 5 个人工智能机器人的回答被确定为高于 6 年级水平的教育水平。发现 Perplexity®的修改后的 DISCERN 和美国医学会杂志评分最高(P<.001)。发现 Gemini®的全球质量量表评分最高(P<.001)。强调患者教育材料的可读性水平应为 6 年级水平。在评估的 5 个 AI 聊天机器人中,Bard®、Copilot®、Perplexity®、ChatGPT®、Gemini®,他们关于姑息治疗的回答目前在文本内容的可读性方面被发现远高于推荐水平。文本内容质量评估得分也较低。文本的质量和可读性都应达到适当的推荐限制。

相似文献

引用本文的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验