评估ChatGPT、Gemini和Perplexity针对强直性脊柱炎最常见问题生成的回答的可读性、质量和可靠性。

Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.

作者信息

Kara Mete, Ozduran Erkan, Kara Müge Mercan, Özbek İlhan Celil, Hancı Volkan

机构信息

Izmir City Hospital, Internal Medicine, Rheumatology, Izmir, Turkey.

Sivas Numune Hospital, Physical Medicine and Rehabilitation, Pain Medicine, Sivas, Turkey.

出版信息

PLoS One. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351. eCollection 2025.

DOI:10.1371/journal.pone.0326351

PMID:40531978

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12176213/

Abstract

Ankylosing spondylitis (AS), which usually occurs in the second and third decades of life, is associated with chronic pain, limitation of mobility, and severe decreases in quality of life. This study aimed to make a comparative evaluation in terms of the readability, information accuracy and quality of the answers given by artificial intelligence (AI)-based chatbots such as ChatGPT, Perplexity and Gemini, which have become popular with the widespread access to medical information, to user questions about AS, a chronic inflammatory joint disease. In this study, the 25 most frequently queried keywords related to AS determined through Google Trends were directed to each 3 AI-based chatbots. The readability of the resulting responses was evaluated using readability indices such as Simple Gunning Fog (GFOG), Flesch Reading Ease Score (FRES) and Measure of Gobbledygook (SMOG). The quality of the responses was measured by Ensuring Quality Information for Patients (EQIP) and Global Quality Score (GQS) scores, and the reliability was measured using the modified DISCERN and Journal of American Medical Association (JAMA) scales. According to Google Trends data, the most frequently searched keywords related to AS are "Ankylosing spondylitis pain", "Ankylosing spondylitis symptoms" and "Ankylosing spondylitis disease", respectively. It was found that the readability levels of the answers produced by AI-based chatbots were above the 6th grade level and showed a statistically significant difference (p < 0.001). In EQIP, JAMA, mDISCERN and GQS evaluations, Perplexity stood out in terms of information quality and reliability, receiving higher scores compared to other chat robots (p < 0.05). It has been found that the answers given by AI chatbots to AS-related questions exceed the recommended readability level and the reliability and quality assessment raises concerns due to some low scores. It is possible for future AI chatbots to have sufficient quality, reliability and appropriate readability levels with an audit mechanism in place.

摘要

强直性脊柱炎（AS）通常发生在人生的第二个和第三个十年，与慢性疼痛、活动受限以及生活质量严重下降有关。本研究旨在对ChatGPT、Perplexity和Gemini等基于人工智能（AI）的聊天机器人给出的答案在可读性、信息准确性和质量方面进行比较评估，这些聊天机器人因医疗信息的广泛获取而受到欢迎，用于回答用户关于AS（一种慢性炎症性关节疾病）的问题。在本研究中，通过谷歌趋势确定的与AS相关的25个最常查询的关键词被发送到每一个基于AI的聊天机器人。使用简单冈宁雾度（GFOG）、弗莱什易读性分数（FRES）和晦涩难懂度测量（SMOG）等可读性指标评估所得回复的可读性。通过患者质量信息保障（EQIP）和全球质量评分（GQS）分数衡量回复的质量，使用修改后的辨别度和美国医学会杂志（JAMA）量表衡量可靠性。根据谷歌趋势数据，与AS相关的最常搜索关键词分别是“强直性脊柱炎疼痛”、“强直性脊柱炎症状”和“强直性脊柱炎疾病”。发现基于AI的聊天机器人给出的答案的可读性水平高于六年级水平，且显示出统计学上的显著差异（p < 0.001）。在EQIP、JAMA、mDISCERN和GQS评估中，Perplexity在信息质量和可靠性方面表现突出，与其他聊天机器人相比得分更高（p < 0.05）。已发现AI聊天机器人对与AS相关问题给出的答案超过了推荐的可读性水平，并且由于一些低分，可靠性和质量评估引发了担忧。未来的AI聊天机器人通过建立审核机制有可能具备足够的质量、可靠性和适当的可读性水平。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c171/12176213/2f2f13e9313e/pone.0326351.g001.jpg

相似文献

Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.

PLoS One. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351. eCollection 2025.

Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and Perplexity regarding most frequently asked keywords about low back pain.

PeerJ. 2025 Jan 22;13:e18847. doi: 10.7717/peerj.18847. eCollection 2025.

Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain.

Medicine (Baltimore). 2025 Mar 14;104(11):e41780. doi: 10.1097/MD.0000000000041780.

Enhancing the Readability of Online Patient Education Materials Using Large Language Models: Cross-Sectional Study.

J Med Internet Res. 2025 Jun 4;27:e69955. doi: 10.2196/69955.

Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.

Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.

AI Chatbots as Sources of STD Information: A Study on Reliability and Readability.

J Med Syst. 2025 Apr 3;49(1):43. doi: 10.1007/s10916-025-02178-z.

Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study.

Medicine (Baltimore). 2024 May 31;103(22):e38352. doi: 10.1097/MD.0000000000038352.

The performance of ChatGPT-4 and Bing Chat in frequently asked questions about glaucoma.

Eur J Ophthalmol. 2025 Jul;35(4):1323-1328. doi: 10.1177/11206721251321197. Epub 2025 Feb 19.

Evaluating ChatGPT as a patient resource for frequently asked questions about lung cancer surgery-a pilot study.

J Thorac Cardiovasc Surg. 2025 Apr;169(4):1174-1180.e18. doi: 10.1016/j.jtcvs.2024.09.030. Epub 2024 Sep 24.

Quality Assessment of Web-Based Information Related to Diet During Pregnancy in Pregnant Women: Cross-Sectional Descriptive Study.

JMIR Form Res. 2025 Jun 3;9:e64630. doi: 10.2196/64630.

本文引用的文献

Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and Perplexity regarding most frequently asked keywords about low back pain.

PeerJ. 2025 Jan 22;13:e18847. doi: 10.7717/peerj.18847. eCollection 2025.

Information and communication technology-based patient education for autoimmune inflammatory rheumatic diseases: A scoping review.

Semin Arthritis Rheum. 2024 Dec;69:152575. doi: 10.1016/j.semarthrit.2024.152575. Epub 2024 Oct 28.

Evaluating the readability, quality and reliability of online patient education materials on chronic low back pain.

Natl Med J India. 2024 May-Jun;37(3):124-130. doi: 10.25259/NMJI_327_2022.

Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study.

Medicine (Baltimore). 2024 May 31;103(22):e38352. doi: 10.1097/MD.0000000000038352.

Evaluation of the accuracy and readability of ChatGPT-4 and Google Gemini in providing information on retinal detachment: a multicenter expert comparative study.

Int J Retina Vitreous. 2024 Sep 2;10(1):61. doi: 10.1186/s40942-024-00579-9.

Reliability of a generative artificial intelligence tool for pediatric familial Mediterranean fever: insights from a multicentre expert survey.

Pediatr Rheumatol Online J. 2024 Aug 23;22(1):78. doi: 10.1186/s12969-024-01011-0.

Application of artificial intelligence in rheumatic disease: a bibliometric analysis.

Clin Exp Med. 2024 Aug 23;24(1):196. doi: 10.1007/s10238-024-01453-6.

Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.

Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.

Assessing the response quality and readability of chatbots in cardiovascular health, oncology, and psoriasis: A comparative study.

Int J Med Inform. 2024 Oct;190:105562. doi: 10.1016/j.ijmedinf.2024.105562. Epub 2024 Jul 19.

[What is the potential of ChatGPT for qualified patient information? : Attempt of a structured analysis on the basis of a survey regarding complementary and alternative medicine (CAM) in rheumatology].

Z Rheumatol. 2025 Apr;84(3):179-187. doi: 10.1007/s00393-024-01535-6. Epub 2024 Jul 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估ChatGPT、Gemini和Perplexity针对强直性脊柱炎最常见问题生成的回答的可读性、质量和可靠性。

Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献