• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估人工智能聊天机器人回答患者关于腰痛常见问题的表现。

Assessing the performance of AI chatbots in answering patients' common questions about low back pain.

作者信息

Scaff Simone P S, Reis Felipe J J, Ferreira Giovanni E, Jacob Maria Fernanda, Saragiotto Bruno T

机构信息

Masters and Doctoral Programs in Physical Therapy, Universidade Cidade de Sao Paulo, Sao Paulo, Brazil.

Physical Therapy Department, Instituto Federal do Rio de Janeiro, Rio de Janeiro, Brazil; Department of Physiotherapy, Human Physiology and Anatomy, Vrije Universiteit Brussel, Brussel, Belgium.

出版信息

Ann Rheum Dis. 2025 Jan;84(1):143-149. doi: 10.1136/ard-2024-226202. Epub 2025 Jan 2.

DOI:10.1136/ard-2024-226202
PMID:39874229
Abstract

OBJECTIVES

The aim of this study was to assess the accuracy and readability of the answers generated by large language model (LLM)-chatbots to common patient questions about low back pain (LBP).

METHODS

This cross-sectional study analysed responses to 30 LBP-related questions, covering self-management, risk factors and treatment. The questions were developed by experienced clinicians and researchers and were piloted with a group of consumer representatives with lived experience of LBP. The inquiries were inputted in prompt form into ChatGPT 3.5, Bing, Bard (Gemini) and ChatGPT 4.0. Responses were evaluated in relation to their accuracy, readability and presence of disclaimers about health advice. The accuracy was assessed by comparing the recommendations generated with the main guidelines for LBP. The responses were analysed by two independent reviewers and classified as accurate, inaccurate or unclear. Readability was measured with the Flesch Reading Ease Score (FRES).

RESULTS

Out of 120 responses yielding 1069 recommendations, 55.8% were accurate, 42.1% inaccurate and 1.9% unclear. Treatment and self-management domains showed the highest accuracy while risk factors had the most inaccuracies. Overall, LLM-chatbots provided answers that were 'reasonably difficult' to read, with a mean (SD) FRES score of 50.94 (3.06). Disclaimer about health advice was present around 70%-100% of the responses produced.

CONCLUSIONS

The use of LLM-chatbots as tools for patient education and counselling in LBP shows promising but variable results. These chatbots generally provide moderately accurate recommendations. However, the accuracy may vary depending on the topic of each question. The reliability level of the answers was inadequate, potentially affecting the patient's ability to comprehend the information.

摘要

目的

本研究旨在评估大语言模型(LLM)聊天机器人针对患者关于腰痛(LBP)的常见问题所给出答案的准确性和可读性。

方法

这项横断面研究分析了对30个与LBP相关问题的回答,涵盖自我管理、风险因素和治疗。这些问题由经验丰富的临床医生和研究人员提出,并在一组有LBP亲身经历的消费者代表中进行了预试验。这些询问以提示形式输入到ChatGPT 3.5、必应、巴德(Gemini)和ChatGPT 4.0中。根据回答的准确性、可读性以及是否存在关于健康建议的免责声明对其进行评估。通过将生成的建议与LBP的主要指南进行比较来评估准确性。由两名独立的评审员对回答进行分析,并分类为准确、不准确或不清楚。使用弗莱什易读性分数(FRES)来衡量可读性。

结果

在产生1069条建议的120个回答中,55.8%准确,42.1%不准确,1.9%不清楚。治疗和自我管理领域的准确性最高,而风险因素方面的不准确回答最多。总体而言,LLM聊天机器人提供的答案“阅读难度适中”,平均(标准差)FRES评分为50.94(3.06)。在大约70%-100%的回答中出现了关于健康建议的免责声明。

结论

将LLM聊天机器人用作LBP患者教育和咨询的工具显示出有前景但结果参差不齐。这些聊天机器人通常提供适度准确的建议。然而,准确性可能因每个问题的主题而异。答案的可靠程度不足,可能会影响患者理解信息的能力。

相似文献

1
Assessing the performance of AI chatbots in answering patients' common questions about low back pain.评估人工智能聊天机器人回答患者关于腰痛常见问题的表现。
Ann Rheum Dis. 2025 Jan;84(1):143-149. doi: 10.1136/ard-2024-226202. Epub 2025 Jan 2.
2
Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and Perplexity regarding most frequently asked keywords about low back pain.评估ChatGPT、Gemini和Perplexity针对有关腰痛的最常见关键词所给出回答的可读性、质量和可靠性。
PeerJ. 2025 Jan 22;13:e18847. doi: 10.7717/peerj.18847. eCollection 2025.
3
Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.评估人工智能聊天机器人提供的关于化疗心脏毒性的患者教育材料的质量和可读性:一项观察性横断面研究。
Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.
4
Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性:公众需谨慎。
Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.
5
Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的可读性:一项观察性横断面研究。
Cureus. 2024 Jul 4;16(7):e63865. doi: 10.7759/cureus.63865. eCollection 2024 Jul.
6
Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.生成式人工智能聊天机器人可能会为患者关于常见血管外科问题提供恰当的信息性回复。
Vascular. 2025 Feb;33(1):229-237. doi: 10.1177/17085381241240550. Epub 2024 Mar 18.
7
Talking technology: exploring chatbots as a tool for cataract patient education.技术漫谈:探索聊天机器人作为白内障患者教育工具的作用
Clin Exp Optom. 2025 Jan;108(1):56-64. doi: 10.1080/08164622.2023.2298812. Epub 2024 Jan 9.
8
The promising role of chatbots in keratorefractive surgery patient education.聊天机器人在角膜屈光手术患者教育中的潜在作用。
J Fr Ophtalmol. 2025 Feb;48(2):104381. doi: 10.1016/j.jfo.2024.104381. Epub 2024 Dec 13.
9
Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard.生成式人工智能聊天机器人针对分娩硬膜外麻醉常见问题的可读性、质量和准确性:ChatGPT与Bard的比较
Int J Obstet Anesth. 2025 Feb;61:104317. doi: 10.1016/j.ijoa.2024.104317. Epub 2024 Dec 20.
10
Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma.谷歌巴德和 ChatGPT-3.5 生成的青光眼手术治疗回复的适宜性和可读性。
Rom J Ophthalmol. 2024 Jul-Sep;68(3):243-248. doi: 10.22336/rjo.2024.45.

引用本文的文献

1
Evaluating artificial intelligence chatbots' responses to gynecomastia inquiries: Comparative study of information quality, readability, and guideline consistency.评估人工智能聊天机器人对男性乳房发育症咨询的回复:信息质量、可读性和指南一致性的比较研究
Digit Health. 2025 Aug 26;11:20552076251367645. doi: 10.1177/20552076251367645. eCollection 2025 Jan-Dec.
2
Adoption and perception of LLM-based chatbots in health care: an exploratory cross-sectional survey of individuals with rheumatic diseases.基于大语言模型的聊天机器人在医疗保健中的应用与认知:对风湿病患者的探索性横断面调查
Rheumatol Adv Pract. 2025 Jul 12;9(3):rkaf083. doi: 10.1093/rap/rkaf083. eCollection 2025.
3
Assessing the adherence of large language models to clinical practice guidelines in Chinese medicine: a content analysis.
评估大型语言模型对中医临床实践指南的遵循情况:一项内容分析
Front Pharmacol. 2025 Jul 25;16:1649041. doi: 10.3389/fphar.2025.1649041. eCollection 2025.
4
Artificial intelligence in personalized rehabilitation: current applications and a SWOT analysis.个性化康复中的人工智能:当前应用及SWOT分析
Front Digit Health. 2025 Jul 24;7:1606088. doi: 10.3389/fdgth.2025.1606088. eCollection 2025.
5
Evaluating the Performance of State-of-the-Art Artificial Intelligence Chatbots Based on the WHO Global Guidelines for the Prevention of Surgical Site Infection: Cross-Sectional Study.基于世界卫生组织预防手术部位感染全球指南评估最先进的人工智能聊天机器人的性能:横断面研究
J Med Internet Res. 2025 Jul 31;27:e75567. doi: 10.2196/75567.
6
Comparison of Multiple State-of-the-Art Large Language Models for Patient Education Prior to CT and MRI Examinations.CT和MRI检查前用于患者教育的多种先进大语言模型的比较
J Pers Med. 2025 Jun 5;15(6):235. doi: 10.3390/jpm15060235.
7
The global burden of low back pain attributable high body mass index over the period 1990-2021 and projections up to 2035.1990年至2021年期间因高体重指数导致的全球腰痛负担及至2035年的预测。
Front Nutr. 2025 Jun 6;12:1568015. doi: 10.3389/fnut.2025.1568015. eCollection 2025.
8
The Role of Artificial Intelligence Large Language Models in Personalized Rehabilitation Programs for Knee Osteoarthritis: An Observational Study.人工智能大语言模型在膝关节骨关节炎个性化康复计划中的作用:一项观察性研究。
J Med Syst. 2025 Jun 3;49(1):73. doi: 10.1007/s10916-025-02207-x.
9
Assessment of ChatGPT's adherence to evidence-based clinical practice guidelines for plantar fasciitis management.评估ChatGPT对足底筋膜炎治疗循证临床实践指南的遵循情况。
J Orthop Surg Res. 2025 Apr 30;20(1):434. doi: 10.1186/s13018-025-05831-y.
10
Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and Perplexity regarding most frequently asked keywords about low back pain.评估ChatGPT、Gemini和Perplexity针对有关腰痛的最常见关键词所给出回答的可读性、质量和可靠性。
PeerJ. 2025 Jan 22;13:e18847. doi: 10.7717/peerj.18847. eCollection 2025.