Suppr超能文献

ChatGPT、Gemini和Perplexity针对最常见疼痛问题生成的回答的可读性、可靠性和质量。

Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain.

作者信息

Ozduran Erkan, Akkoc Ibrahim, Büyükçoban Sibel, Erkin Yüksel, Hanci Volkan

机构信息

Sivas Numune Hospital, Physical Medicine and Rehabilitation, Pain Medicine, Sivas, Turkey.

University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Anesthesiology and Reanimation, Istanbul, Turkey.

出版信息

Medicine (Baltimore). 2025 Mar 14;104(11):e41780. doi: 10.1097/MD.0000000000041780.

Abstract

It is clear that artificial intelligence-based chatbots will be popular applications in the field of healthcare in the near future. It is known that more than 30% of the world's population suffers from chronic pain and individuals try to access the health information they need through online platforms before applying to the hospital. This study aimed to examine the readability, reliability and quality of the responses given by 3 different artificial intelligence chatbots (ChatGPT, Gemini and Perplexity) to frequently asked questions about pain. In this study, the 25 most frequently used keywords related to pain were determined using Google Trend and asked to every 3 artificial intelligence chatbots. The readability of the response texts was determined by Flesch Reading Ease Score (FRES), Simple Measure of Gobbledygook, Gunning Fog and Flesch-Kincaid Grade Level readability scoring. Reliability assessment was determined by the Journal of American Medical Association (JAMA), DISCERN scales. Global Quality Score and Ensuring Quality Information for Patients (EQIP) score were used in quality assessment. As a result of Google Trend search, the first 3 keywords were determined as "back pain," "stomach pain," and "chest pain." The readability of the answers given by all 3 artificial intelligence applications was determined to be higher than the recommended 6th grade readability level (P < .001). In the readability evaluation, the order from easy to difficult was determined as Google Gemini, ChatGPT and Perplexity. Higher GQS scores (P = .008) were detected in Gemini compared to other chatbots. Perplexity had higher JAMA, DISCERN and EQIP scores compared to other chatbots, respectively (P < .001, P < .001, P < .05). It has been determined that the answers given by ChatGPT, Gemini, and Perplexity to pain-related questions are difficult to read and their reliability and quality are low. It can be stated that these artificial intelligence chatbots cannot replace a comprehensive medical consultation. In artificial intelligence applications, it may be recommended to facilitate the readability of text content, create texts containing reliable references, and control them by a supervisory expert team.

摘要

很明显,基于人工智能的聊天机器人在不久的将来将成为医疗保健领域的热门应用。众所周知,全球超过30%的人口患有慢性疼痛,人们在去医院就诊之前会试图通过在线平台获取他们所需的健康信息。本研究旨在检验3种不同的人工智能聊天机器人(ChatGPT、Gemini和Perplexity)对有关疼痛的常见问题给出的回答的可读性、可靠性和质量。在本研究中,使用谷歌趋势确定了与疼痛相关的25个最常用关键词,并向3个人工智能聊天机器人提问。回答文本的可读性通过弗莱什易读性评分(FRES)、简易语言衡量、冈宁迷雾指数和弗莱什-金凯德年级水平可读性评分来确定。可靠性评估由美国医学会杂志(JAMA)、辨别量表来确定。全球质量评分和为患者确保质量信息(EQIP)评分用于质量评估。谷歌趋势搜索的结果是,前3个关键词被确定为“背痛”“胃痛”和“胸痛”。所有3个人工智能应用给出的回答的可读性都被确定高于推荐的6年级可读性水平(P < 0.001)。在可读性评估中,从易到难的顺序被确定为谷歌Gemini、ChatGPT和Perplexity。与其他聊天机器人相比,Gemini的全球质量评分更高(P = 0.008)。与其他聊天机器人相比,Perplexity的JAMA、辨别量表和EQIP评分分别更高(P < 0.001,P < 0.001,P < 0.05)。已确定ChatGPT、Gemini和Perplexity对与疼痛相关问题给出的回答难以阅读,其可靠性和质量较低。可以说,这些人工智能聊天机器人无法取代全面的医疗咨询。在人工智能应用中,可能建议提高文本内容的可读性,创建包含可靠参考文献的文本,并由一个监督专家团队进行把控。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c0f/11922396/862907540254/medi-104-e41780-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验