Kara Mete, Ozduran Erkan, Kara Müge Mercan, Özbek İlhan Celil, Hancı Volkan
Izmir City Hospital, Internal Medicine, Rheumatology, Izmir, Turkey.
Sivas Numune Hospital, Physical Medicine and Rehabilitation, Pain Medicine, Sivas, Turkey.
PLoS One. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351. eCollection 2025.
Ankylosing spondylitis (AS), which usually occurs in the second and third decades of life, is associated with chronic pain, limitation of mobility, and severe decreases in quality of life. This study aimed to make a comparative evaluation in terms of the readability, information accuracy and quality of the answers given by artificial intelligence (AI)-based chatbots such as ChatGPT, Perplexity and Gemini, which have become popular with the widespread access to medical information, to user questions about AS, a chronic inflammatory joint disease. In this study, the 25 most frequently queried keywords related to AS determined through Google Trends were directed to each 3 AI-based chatbots. The readability of the resulting responses was evaluated using readability indices such as Simple Gunning Fog (GFOG), Flesch Reading Ease Score (FRES) and Measure of Gobbledygook (SMOG). The quality of the responses was measured by Ensuring Quality Information for Patients (EQIP) and Global Quality Score (GQS) scores, and the reliability was measured using the modified DISCERN and Journal of American Medical Association (JAMA) scales. According to Google Trends data, the most frequently searched keywords related to AS are "Ankylosing spondylitis pain", "Ankylosing spondylitis symptoms" and "Ankylosing spondylitis disease", respectively. It was found that the readability levels of the answers produced by AI-based chatbots were above the 6th grade level and showed a statistically significant difference (p < 0.001). In EQIP, JAMA, mDISCERN and GQS evaluations, Perplexity stood out in terms of information quality and reliability, receiving higher scores compared to other chat robots (p < 0.05). It has been found that the answers given by AI chatbots to AS-related questions exceed the recommended readability level and the reliability and quality assessment raises concerns due to some low scores. It is possible for future AI chatbots to have sufficient quality, reliability and appropriate readability levels with an audit mechanism in place.
强直性脊柱炎(AS)通常发生在人生的第二个和第三个十年,与慢性疼痛、活动受限以及生活质量严重下降有关。本研究旨在对ChatGPT、Perplexity和Gemini等基于人工智能(AI)的聊天机器人给出的答案在可读性、信息准确性和质量方面进行比较评估,这些聊天机器人因医疗信息的广泛获取而受到欢迎,用于回答用户关于AS(一种慢性炎症性关节疾病)的问题。在本研究中,通过谷歌趋势确定的与AS相关的25个最常查询的关键词被发送到每一个基于AI的聊天机器人。使用简单冈宁雾度(GFOG)、弗莱什易读性分数(FRES)和晦涩难懂度测量(SMOG)等可读性指标评估所得回复的可读性。通过患者质量信息保障(EQIP)和全球质量评分(GQS)分数衡量回复的质量,使用修改后的辨别度和美国医学会杂志(JAMA)量表衡量可靠性。根据谷歌趋势数据,与AS相关的最常搜索关键词分别是“强直性脊柱炎疼痛”、“强直性脊柱炎症状”和“强直性脊柱炎疾病”。发现基于AI的聊天机器人给出的答案的可读性水平高于六年级水平,且显示出统计学上的显著差异(p < 0.001)。在EQIP、JAMA、mDISCERN和GQS评估中,Perplexity在信息质量和可靠性方面表现突出,与其他聊天机器人相比得分更高(p < 0.05)。已发现AI聊天机器人对与AS相关问题给出的答案超过了推荐的可读性水平,并且由于一些低分,可靠性和质量评估引发了担忧。未来的AI聊天机器人通过建立审核机制有可能具备足够的质量、可靠性和适当的可读性水平。
Eur J Ophthalmol. 2025-7
J Thorac Cardiovasc Surg. 2025-4
Pediatr Rheumatol Online J. 2024-8-23
Clin Exp Med. 2024-8-23