Li Kai, Peng Yunfei, Li Luyi, Liu Bo, Huang Zhijian
School of Journalism and Communication, Guangxi University, No.100, Daxue East Road, Nanning, 530004, China, 86 13367611322.
Department of Rheumatology and Immunology, The Second Affiliated Hospital of Guangxi Medical University, Nanning, China.
JMIR Form Res. 2025 Aug 28;9:e76458. doi: 10.2196/76458.
Systemic lupus erythematosus (SLE) is a life-threatening, multisystem autoimmune disease. Biologic therapy is a promising treatment for SLE. However, public understanding of this therapy is still insufficient, and the quality of related information on the internet varies, which affects patients' acceptance of this treatment. The effectiveness of artificial intelligence technologies, such as ChatGPT (OpenAI), in knowledge dissemination within the health care field has attracted significant attention. Research on ChatGPT's utility in answering questions regarding biologic therapy for SLE could promote the dissemination of this treatment.
This study aimed to evaluate ChatGPT's utility as a tool for users to obtain health information about biologic therapy for SLE.
This study extracted 20 common questions related to biologic therapy for SLE, their corresponding answers, and the sources of these answers from both Google Web Search and ChatGPT-4o (OpenAI). Then, based on Rothwell's classification, the questions were categorized into 3 main types: fact, policy, and value. The sources of the answers were classified into 5 categories: commercial, academic, medical practice, government, and social media. The accuracy and completeness of the answers were assessed using Likert scales. The readability of the answers was evaluated using the Flesch Reading Ease and Flesch-Kincaid Grade Level (FKGL) scores.
The study found that, in terms of question types, ChatGPT-4o had the highest proportion of fact questions (10/20), followed by policy (7/20) and value (3/20). Google Web Search had the highest proportion of fact questions (12/20), followed by value (5/20) and policy (3/20). In terms of website sources, ChatGPT-4o's answers were sourced from 48 sources, with the majority coming from academic sources (29/48). Google Web Search provided answers from 20 sources, with an even distribution across all 5 categories. For accuracy, ChatGPT-4o's mean score of 5.83 (SD 0.49) was higher than that of Google Web Search (mean 4.75, SD 0.94), with a mean difference of 1.08 (95% CI 0.61-1.54). For completeness, ChatGPT-4o's mean score of 2.88 (SD 0.32) was higher than that of Google Web Search (mean 1.68, SD 0.69), with a mean difference of 1.2 (95% CI 0.96-1.44). For readability, the Flesch Reading Ease and Flesch-Kincaid Grade Level scores for ChatGPT-4o and Google Web Search were 11.7 and 14.9, and 16.2 and 20, respectively, indicating that both texts were of high reading difficulty, requiring readers to have a college graduate-level reading proficiency. When asking ChatGPT to respond at a sixth-grade level, the readability of the answers significantly improved.
ChatGPT's answers are characterized by accuracy, rigor, comprehensiveness, and professional supporting materials, and demonstrate humanistic care. However, the readability of the provided text is low, requiring users to have a college education background. Given the study's limitations in question scope, comparison dimensions, research perspectives, and language types, further in-depth comparative research is recommended.
系统性红斑狼疮(SLE)是一种危及生命的多系统自身免疫性疾病。生物疗法是治疗SLE的一种有前景的方法。然而,公众对这种疗法的了解仍然不足,互联网上相关信息的质量参差不齐,这影响了患者对该治疗方法的接受度。诸如ChatGPT(OpenAI)等人工智能技术在医疗保健领域知识传播方面的有效性已引起广泛关注。研究ChatGPT在回答有关SLE生物疗法问题方面的效用有助于促进该疗法的传播。
本研究旨在评估ChatGPT作为用户获取SLE生物疗法健康信息工具的效用。
本研究从谷歌网络搜索和ChatGPT-4o(OpenAI)中提取了20个与SLE生物疗法相关的常见问题、其相应答案以及这些答案的来源。然后,根据罗斯韦尔分类法,将问题分为3种主要类型:事实、政策和价值观。答案来源分为5类:商业、学术、医疗实践、政府和社交媒体。使用李克特量表评估答案的准确性和完整性。使用弗莱什易读性和弗莱什 - 金凯德年级水平(FKGL)分数评估答案的可读性。
研究发现,就问题类型而言,ChatGPT-4o的事实类问题比例最高(10/20),其次是政策类(7/20)和价值观类(3/20)。谷歌网络搜索的事实类问题比例最高(12/20),其次是价值观类(5/20)和政策类(3/20)。在网站来源方面,ChatGPT-4o的答案来自48个来源,大部分来自学术来源(29/48)。谷歌网络搜索提供了20个来源的答案,在所有5个类别中分布均匀。在准确性方面,ChatGPT-4o的平均得分为5.83(标准差0.49),高于谷歌网络搜索(平均4.75,标准差0.94),平均差异为1.08(95%置信区间0.61 - 1.54)。在完整性方面,ChatGPT-4o的平均得分为2.88(标准差0.32),高于谷歌网络搜索(平均1.68,标准差0.69),平均差异为1.2(95%置信区间0.96 - 1.44)。在可读性方面,ChatGPT-4o和谷歌网络搜索的弗莱什易读性分数分别为11.7和14.9,弗莱什 - 金凯德年级水平分数分别为16.2和20,表明两篇文本的阅读难度都很高,要求读者具备大学毕业水平的阅读能力。当要求ChatGPT以六年级水平回复时,答案的可读性显著提高。
ChatGPT的答案具有准确性、严谨性、全面性和专业支撑材料,并体现了人文关怀。然而,所提供文本的可读性较低,要求用户具有大学教育背景。鉴于本研究在问题范围、比较维度研究视角和语言类型方面存在局限性,建议进行进一步深入的比较研究。