文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

评估ChatGPT、Gemini和Perplexity针对强直性脊柱炎最常见问题生成的回答的可读性、质量和可靠性。

Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.

作者信息

Kara Mete, Ozduran Erkan, Kara Müge Mercan, Özbek İlhan Celil, Hancı Volkan

机构信息

Izmir City Hospital, Internal Medicine, Rheumatology, Izmir, Turkey.

Sivas Numune Hospital, Physical Medicine and Rehabilitation, Pain Medicine, Sivas, Turkey.

出版信息

PLoS One. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351. eCollection 2025.


DOI:10.1371/journal.pone.0326351
PMID:40531978
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12176213/
Abstract

Ankylosing spondylitis (AS), which usually occurs in the second and third decades of life, is associated with chronic pain, limitation of mobility, and severe decreases in quality of life. This study aimed to make a comparative evaluation in terms of the readability, information accuracy and quality of the answers given by artificial intelligence (AI)-based chatbots such as ChatGPT, Perplexity and Gemini, which have become popular with the widespread access to medical information, to user questions about AS, a chronic inflammatory joint disease. In this study, the 25 most frequently queried keywords related to AS determined through Google Trends were directed to each 3 AI-based chatbots. The readability of the resulting responses was evaluated using readability indices such as Simple Gunning Fog (GFOG), Flesch Reading Ease Score (FRES) and Measure of Gobbledygook (SMOG). The quality of the responses was measured by Ensuring Quality Information for Patients (EQIP) and Global Quality Score (GQS) scores, and the reliability was measured using the modified DISCERN and Journal of American Medical Association (JAMA) scales. According to Google Trends data, the most frequently searched keywords related to AS are "Ankylosing spondylitis pain", "Ankylosing spondylitis symptoms" and "Ankylosing spondylitis disease", respectively. It was found that the readability levels of the answers produced by AI-based chatbots were above the 6th grade level and showed a statistically significant difference (p < 0.001). In EQIP, JAMA, mDISCERN and GQS evaluations, Perplexity stood out in terms of information quality and reliability, receiving higher scores compared to other chat robots (p < 0.05). It has been found that the answers given by AI chatbots to AS-related questions exceed the recommended readability level and the reliability and quality assessment raises concerns due to some low scores. It is possible for future AI chatbots to have sufficient quality, reliability and appropriate readability levels with an audit mechanism in place.

摘要

强直性脊柱炎(AS)通常发生在人生的第二个和第三个十年,与慢性疼痛、活动受限以及生活质量严重下降有关。本研究旨在对ChatGPT、Perplexity和Gemini等基于人工智能(AI)的聊天机器人给出的答案在可读性、信息准确性和质量方面进行比较评估,这些聊天机器人因医疗信息的广泛获取而受到欢迎,用于回答用户关于AS(一种慢性炎症性关节疾病)的问题。在本研究中,通过谷歌趋势确定的与AS相关的25个最常查询的关键词被发送到每一个基于AI的聊天机器人。使用简单冈宁雾度(GFOG)、弗莱什易读性分数(FRES)和晦涩难懂度测量(SMOG)等可读性指标评估所得回复的可读性。通过患者质量信息保障(EQIP)和全球质量评分(GQS)分数衡量回复的质量,使用修改后的辨别度和美国医学会杂志(JAMA)量表衡量可靠性。根据谷歌趋势数据,与AS相关的最常搜索关键词分别是“强直性脊柱炎疼痛”、“强直性脊柱炎症状”和“强直性脊柱炎疾病”。发现基于AI的聊天机器人给出的答案的可读性水平高于六年级水平,且显示出统计学上的显著差异(p < 0.001)。在EQIP、JAMA、mDISCERN和GQS评估中,Perplexity在信息质量和可靠性方面表现突出,与其他聊天机器人相比得分更高(p < 0.05)。已发现AI聊天机器人对与AS相关问题给出的答案超过了推荐的可读性水平,并且由于一些低分,可靠性和质量评估引发了担忧。未来的AI聊天机器人通过建立审核机制有可能具备足够的质量、可靠性和适当的可读性水平。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c171/12176213/e11cdce80a8e/pone.0326351.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c171/12176213/2f2f13e9313e/pone.0326351.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c171/12176213/e11cdce80a8e/pone.0326351.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c171/12176213/2f2f13e9313e/pone.0326351.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c171/12176213/e11cdce80a8e/pone.0326351.g002.jpg

相似文献

[1]
Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.

PLoS One. 2025-6-18

[2]
Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and Perplexity regarding most frequently asked keywords about low back pain.

PeerJ. 2025-1-22

[3]
Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain.

Medicine (Baltimore). 2025-3-14

[4]
Enhancing the Readability of Online Patient Education Materials Using Large Language Models: Cross-Sectional Study.

J Med Internet Res. 2025-6-4

[5]
Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.

Medicine (Baltimore). 2024-8-16

[6]
AI Chatbots as Sources of STD Information: A Study on Reliability and Readability.

J Med Syst. 2025-4-3

[7]
Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study.

Medicine (Baltimore). 2024-5-31

[8]
The performance of ChatGPT-4 and Bing Chat in frequently asked questions about glaucoma.

Eur J Ophthalmol. 2025-7

[9]
Evaluating ChatGPT as a patient resource for frequently asked questions about lung cancer surgery-a pilot study.

J Thorac Cardiovasc Surg. 2025-4

[10]
Quality Assessment of Web-Based Information Related to Diet During Pregnancy in Pregnant Women: Cross-Sectional Descriptive Study.

JMIR Form Res. 2025-6-3

本文引用的文献

[1]
Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and Perplexity regarding most frequently asked keywords about low back pain.

PeerJ. 2025-1-22

[2]
Information and communication technology-based patient education for autoimmune inflammatory rheumatic diseases: A scoping review.

Semin Arthritis Rheum. 2024-12

[3]
Evaluating the readability, quality and reliability of online patient education materials on chronic low back pain.

Natl Med J India. 2024

[4]
Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study.

Medicine (Baltimore). 2024-5-31

[5]
Evaluation of the accuracy and readability of ChatGPT-4 and Google Gemini in providing information on retinal detachment: a multicenter expert comparative study.

Int J Retina Vitreous. 2024-9-2

[6]
Reliability of a generative artificial intelligence tool for pediatric familial Mediterranean fever: insights from a multicentre expert survey.

Pediatr Rheumatol Online J. 2024-8-23

[7]
Application of artificial intelligence in rheumatic disease: a bibliometric analysis.

Clin Exp Med. 2024-8-23

[8]
Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.

Medicine (Baltimore). 2024-8-16

[9]
Assessing the response quality and readability of chatbots in cardiovascular health, oncology, and psoriasis: A comparative study.

Int J Med Inform. 2024-10

[10]
[What is the potential of ChatGPT for qualified patient information? : Attempt of a structured analysis on the basis of a survey regarding complementary and alternative medicine (CAM) in rheumatology].

Z Rheumatol. 2025-4

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索