• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

医生与人工智能的较量:横断面研究中对大语言模型回答风湿病患者问题的患者和医生评估。

Doctor Versus Artificial Intelligence: Patient and Physician Evaluation of Large Language Model Responses to Rheumatology Patient Questions in a Cross-Sectional Study.

机构信息

University of Alberta, Edmonton, Alberta, Canada.

University Hospital Düsseldorf, Düsseldorf, Germany.

出版信息

Arthritis Rheumatol. 2024 Mar;76(3):479-484. doi: 10.1002/art.42737. Epub 2024 Jan 18.

DOI:10.1002/art.42737
PMID:37902018
Abstract

OBJECTIVE

The objective of the current study was to assess the quality of large language model (LLM) chatbot versus physician-generated responses to patient-generated rheumatology questions.

METHODS

We conducted a single-center cross-sectional survey of rheumatology patients (n = 17) in Edmonton, Alberta, Canada. Patients evaluated LLM chatbot versus physician-generated responses for comprehensiveness and readability, with four rheumatologists also evaluating accuracy by using a Likert scale from 1 to 10 (1 being poor, 10 being excellent).

RESULTS

Patients rated no significant difference between artificial intelligence (AI) and physician-generated responses in comprehensiveness (mean 7.12 ± SD 0.99 vs 7.52 ± 1.16; P = 0.1962) or readability (7.90 ± 0.90 vs 7.80 ± 0.75; P = 0.5905). Rheumatologists rated AI responses significantly poorer than physician responses on comprehensiveness (AI 5.52 ± 2.13 vs physician 8.76 ± 1.07; P < 0.0001), readability (AI 7.85 ± 0.92 vs physician 8.75 ± 0.57; P = 0.0003), and accuracy (AI 6.48 ± 2.07 vs physician 9.08 ± 0.64; P < 0.0001). The proportion of preference to AI- versus physician-generated responses by patients and physicians was 0.45 ± 0.18 and 0.15 ± 0.08, respectively (P = 0.0106). After learning that one answer for each question was AI generated, patients were able to correctly identify AI-generated answers at a lower proportion compared to physicians (0.49 ± 0.26 vs 0.97 ± 0.04; P = 0.0183). The average word count of AI answers was 69.10 ± 25.35 words, as compared to 98.83 ± 34.58 words for physician-generated responses (P = 0.0008).

CONCLUSION

Rheumatology patients rated AI-generated responses to patient questions similarly to physician-generated responses in terms of comprehensiveness, readability, and overall preference. However, rheumatologists rated AI responses significantly poorer than physician-generated responses, suggesting that LLM chatbot responses are inferior to physician responses, a difference that patients may not be aware of.

摘要

目的

本研究旨在评估大型语言模型(LLM)聊天机器人与医生针对患者提出的风湿学问题生成的回复的质量。

方法

我们在加拿大阿尔伯塔省埃德蒙顿进行了一项单中心横断面调查,纳入了 17 名风湿科患者。患者对 LLM 聊天机器人与医生生成的回复的全面性和可读性进行了评估,四位风湿病专家还使用 1 到 10 分的李克特量表(1 表示差,10 表示优)对准确性进行了评估。

结果

患者在全面性(人工智能 7.12±0.99 分与医生 7.52±1.16 分;P=0.1962)或可读性(人工智能 7.90±0.90 分与医生 7.80±0.75 分;P=0.5905)方面,并未发现人工智能和医生生成的回复之间有显著差异。风湿病专家对全面性(人工智能 5.52±2.13 分与医生 8.76±1.07 分;P<0.0001)、可读性(人工智能 7.85±0.92 分与医生 8.75±0.57 分;P=0.0003)和准确性(人工智能 6.48±2.07 分与医生 9.08±0.64 分;P<0.0001)的评价显著差于医生生成的回复。患者和医生分别对人工智能生成的回复和医生生成的回复更偏好的比例为 0.45±0.18 和 0.15±0.08(P=0.0106)。在得知每个问题的一个答案是由人工智能生成的后,患者能够正确识别出人工智能生成的答案的比例低于医生(0.49±0.26 与 0.97±0.04;P=0.0183)。人工智能答案的平均字数为 69.10±25.35 个单词,而医生生成的回复的平均字数为 98.83±34.58 个单词(P=0.0008)。

结论

在全面性、可读性和整体偏好方面,风湿科患者对人工智能生成的回复与医生生成的回复的评价相似。然而,风湿病专家对人工智能的回复评价明显差于医生生成的回复,这表明大型语言模型聊天机器人的回复不如医生生成的回复,而患者可能没有意识到这一点。

相似文献

1
Doctor Versus Artificial Intelligence: Patient and Physician Evaluation of Large Language Model Responses to Rheumatology Patient Questions in a Cross-Sectional Study.医生与人工智能的较量:横断面研究中对大语言模型回答风湿病患者问题的患者和医生评估。
Arthritis Rheumatol. 2024 Mar;76(3):479-484. doi: 10.1002/art.42737. Epub 2024 Jan 18.
2
Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的患者问题的回复。
JAMA Intern Med. 2023 Jun 1;183(6):589-596. doi: 10.1001/jamainternmed.2023.1838.
3
Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions.眼科医生与大型语言模型聊天机器人对在线患者眼部护理问题的回复比较。
JAMA Netw Open. 2023 Aug 1;6(8):e2330320. doi: 10.1001/jamanetworkopen.2023.30320.
4
Physician and Artificial Intelligence Chatbot Responses to Cancer Questions From Social Media.医生与人工智能聊天机器人对社交媒体上癌症问题的回复。
JAMA Oncol. 2024 Jul 1;10(7):956-960. doi: 10.1001/jamaoncol.2024.0836.
5
Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions.大型语言模型对放射肿瘤学患者护理问题的回复质量。
JAMA Netw Open. 2024 Apr 1;7(4):e244630. doi: 10.1001/jamanetworkopen.2024.4630.
6
Accuracy and Reliability of Chatbot Responses to Physician Questions.聊天机器人对医生提问回答的准确性和可靠性。
JAMA Netw Open. 2023 Oct 2;6(10):e2336483. doi: 10.1001/jamanetworkopen.2023.36483.
7
Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures.人工智能聊天机器人对改编自患者手册的青光眼问题的回答情况。
Cureus. 2024 Mar 23;16(3):e56766. doi: 10.7759/cureus.56766. eCollection 2024 Mar.
8
"Doctor ChatGPT, Can You Help Me?" The Patient's Perspective: Cross-Sectional Study.“医生 ChatGPT,你能帮我吗?”患者视角:横断面研究。
J Med Internet Res. 2024 Oct 1;26:e58831. doi: 10.2196/58831.
9
Comprehensiveness, Accuracy, and Readability of Exercise Recommendations Provided by an AI-Based Chatbot: Mixed Methods Study.基于人工智能的聊天机器人提供的运动建议的全面性、准确性和可读性:混合方法研究。
JMIR Med Educ. 2024 Jan 11;10:e51308. doi: 10.2196/51308.
10
From jargon to clarity: Improving the readability of foot and ankle radiology reports with an artificial intelligence large language model.从行话到清晰明了:利用人工智能大语言模型提高足踝放射学报告的可读性
Foot Ankle Surg. 2024 Jun;30(4):331-337. doi: 10.1016/j.fas.2024.01.008. Epub 2024 Feb 5.

引用本文的文献

1
Adoption and perception of LLM-based chatbots in health care: an exploratory cross-sectional survey of individuals with rheumatic diseases.基于大语言模型的聊天机器人在医疗保健中的应用与认知:对风湿病患者的探索性横断面调查
Rheumatol Adv Pract. 2025 Jul 12;9(3):rkaf083. doi: 10.1093/rap/rkaf083. eCollection 2025.
2
Patients prefer artificial intelligence large language model-generated responses to those prepared by the American College of Mohs Surgery: A double-blind comparative study using ChatGPT and Google Gemini.与美国莫氏外科学会准备的回复相比,患者更喜欢人工智能大语言模型生成的回复:一项使用ChatGPT和谷歌Gemini的双盲对比研究。
JAAD Int. 2025 May 29;21:52-54. doi: 10.1016/j.jdin.2025.04.005. eCollection 2025 Aug.
3
Comparative Analysis of Large Language Models for Answering Cancer-Related Questions in Korean.
用于回答韩语癌症相关问题的大语言模型的比较分析
Yonsei Med J. 2025 Jul;66(7):405-411. doi: 10.3349/ymj.2024.0200.
4
Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.评估ChatGPT、Gemini和Perplexity针对强直性脊柱炎最常见问题生成的回答的可读性、质量和可靠性。
PLoS One. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351. eCollection 2025.
5
RAGing ahead in rheumatology: new language model architectures to tame artificial intelligence.风湿病学领域的飞速发展:用于驾驭人工智能的新型语言模型架构
Ther Adv Musculoskelet Dis. 2025 Apr 21;17:1759720X251331529. doi: 10.1177/1759720X251331529. eCollection 2025.
6
Profiling of Cardiogenic Shock: Incorporating Machine Learning Into Bedside Management.心源性休克的剖析:将机器学习纳入床边管理
J Soc Cardiovasc Angiogr Interv. 2024 May 28;4(3Part B):102047. doi: 10.1016/j.jscai.2024.102047. eCollection 2025 Mar.
7
A Future of Self-Directed Patient Internet Research: Large Language Model-Based Tools Versus Standard Search Engines.自主导向的患者网络研究的未来:基于大语言模型的工具与标准搜索引擎
Ann Biomed Eng. 2025 May;53(5):1199-1208. doi: 10.1007/s10439-025-03701-6. Epub 2025 Mar 3.
8
Large Language Models for Chatbot Health Advice Studies: A Systematic Review.用于聊天机器人健康建议研究的大语言模型:一项系统综述。
JAMA Netw Open. 2025 Feb 3;8(2):e2457879. doi: 10.1001/jamanetworkopen.2024.57879.
9
Large Language Models in Diabetes Management: The Need for Human and Artificial Intelligence Collaboration.糖尿病管理中的大语言模型:人机协作的必要性。
Diabetes Care. 2025 Feb 1;48(2):182-184. doi: 10.2337/dci24-0079.
10
Current applications and challenges in large language models for patient care: a systematic review.用于患者护理的大语言模型的当前应用与挑战:一项系统综述
Commun Med (Lond). 2025 Jan 21;5(1):26. doi: 10.1038/s43856-024-00717-2.