比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的患者问题的回复。

Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.

机构信息

Qualcomm Institute, University of California San Diego, La Jolla.

Division of Infectious Diseases and Global Public Health, Department of Medicine, University of California San Diego, La Jolla.

出版信息

JAMA Intern Med. 2023 Jun 1;183(6):589-596. doi: 10.1001/jamainternmed.2023.1838.

DOI:10.1001/jamainternmed.2023.1838

PMID:37115527

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10148230/

Abstract

IMPORTANCE

The rapid expansion of virtual health care has caused a surge in patient messages concomitant with more work and burnout among health care professionals. Artificial intelligence (AI) assistants could potentially aid in creating answers to patient questions by drafting responses that could be reviewed by clinicians.

OBJECTIVE

To evaluate the ability of an AI chatbot assistant (ChatGPT), released in November 2022, to provide quality and empathetic responses to patient questions.

DESIGN, SETTING, AND PARTICIPANTS: In this cross-sectional study, a public and nonidentifiable database of questions from a public social media forum (Reddit's r/AskDocs) was used to randomly draw 195 exchanges from October 2022 where a verified physician responded to a public question. Chatbot responses were generated by entering the original question into a fresh session (without prior questions having been asked in the session) on December 22 and 23, 2022. The original question along with anonymized and randomly ordered physician and chatbot responses were evaluated in triplicate by a team of licensed health care professionals. Evaluators chose "which response was better" and judged both "the quality of information provided" (very poor, poor, acceptable, good, or very good) and "the empathy or bedside manner provided" (not empathetic, slightly empathetic, moderately empathetic, empathetic, and very empathetic). Mean outcomes were ordered on a 1 to 5 scale and compared between chatbot and physicians.

RESULTS

Of the 195 questions and responses, evaluators preferred chatbot responses to physician responses in 78.6% (95% CI, 75.0%-81.8%) of the 585 evaluations. Mean (IQR) physician responses were significantly shorter than chatbot responses (52 [17-62] words vs 211 [168-245] words; t = 25.4; P < .001). Chatbot responses were rated of significantly higher quality than physician responses (t = 13.3; P < .001). The proportion of responses rated as good or very good quality (≥ 4), for instance, was higher for chatbot than physicians (chatbot: 78.5%, 95% CI, 72.3%-84.1%; physicians: 22.1%, 95% CI, 16.4%-28.2%;). This amounted to 3.6 times higher prevalence of good or very good quality responses for the chatbot. Chatbot responses were also rated significantly more empathetic than physician responses (t = 18.9; P < .001). The proportion of responses rated empathetic or very empathetic (≥4) was higher for chatbot than for physicians (physicians: 4.6%, 95% CI, 2.1%-7.7%; chatbot: 45.1%, 95% CI, 38.5%-51.8%; physicians: 4.6%, 95% CI, 2.1%-7.7%). This amounted to 9.8 times higher prevalence of empathetic or very empathetic responses for the chatbot.

CONCLUSIONS

In this cross-sectional study, a chatbot generated quality and empathetic responses to patient questions posed in an online forum. Further exploration of this technology is warranted in clinical settings, such as using chatbot to draft responses that physicians could then edit. Randomized trials could assess further if using AI assistants might improve responses, lower clinician burnout, and improve patient outcomes.

摘要

重要性

虚拟医疗保健的迅速扩张导致患者信息的激增，同时医疗保健专业人员的工作和倦怠感也随之增加。人工智能（AI）助手可以通过起草回复来帮助医生回答患者的问题，这些回复可以由临床医生进行审查。

目的

评估 2022 年 11 月发布的 AI 聊天机器人助手（ChatGPT）为患者问题提供优质和富有同理心的回复的能力。

设计、设置和参与者：在这项横断面研究中，使用了一个公共的、无法识别的社交媒体论坛（Reddit 的 r/AskDocs）上的问题数据库，随机抽取了 2022 年 10 月的 195 个交流记录，其中有一位经过认证的医生回复了一个公开问题。2022 年 12 月 22 日和 23 日，通过在一个新会话中输入原始问题（在会话中没有之前的问题）生成了聊天机器人的回复。将原始问题以及匿名和随机排序的医生和聊天机器人的回复由一组持牌医疗保健专业人员进行了三次评估。评估人员选择“哪个回复更好”，并判断“提供的信息质量”（非常差、差、可接受、好或非常好）和“提供的同理心或态度”（不同理心、略有同理心、适度同理心、同理心和非常同理心）。平均结果按 1 到 5 的等级排序，并在聊天机器人和医生之间进行比较。

结果

在 195 个问题和回复中，在 585 次评估中有 78.6%（95%CI，75.0%-81.8%）评估人员更喜欢聊天机器人的回复而不是医生的回复。平均（IQR）医生的回复明显短于聊天机器人的回复（52 [17-62] 个单词与 211 [168-245] 个单词；t=25.4；P<0.001）。聊天机器人的回复质量明显高于医生的回复（t=13.3；P<0.001）。例如，高质量（≥4）的回复比例，聊天机器人的回复要高于医生的回复（聊天机器人：78.5%，95%CI，72.3%-84.1%；医生：22.1%，95%CI，16.4%-28.2%）。这意味着聊天机器人的高质量回复的流行程度是医生的 3.6 倍。聊天机器人的回复也被认为比医生的回复更有同理心（t=18.9；P<0.001）。被评为有同理心或非常有同理心（≥4）的回复比例，聊天机器人的回复也高于医生的回复（医生：4.6%，95%CI，2.1%-7.7%；聊天机器人：45.1%，95%CI，38.5%-51.8%；医生：4.6%，95%CI，2.1%-7.7%）。这意味着聊天机器人的有同理心或非常有同理心的回复的流行程度是医生的 9.8 倍。

结论

在这项横断面研究中，聊天机器人对在线论坛上提出的患者问题提供了高质量和富有同理心的回复。在临床环境中进一步探索这项技术是有必要的，例如使用聊天机器人起草回复，然后由临床医生进行编辑。随机试验可以进一步评估使用人工智能助手是否可以改善回复、降低临床医生的倦怠感并改善患者的结果。

相似文献

Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.

JAMA Intern Med. 2023 Jun 1;183(6):589-596. doi: 10.1001/jamainternmed.2023.1838.

Physician and Artificial Intelligence Chatbot Responses to Cancer Questions From Social Media.

JAMA Oncol. 2024 Jul 1;10(7):956-960. doi: 10.1001/jamaoncol.2024.0836.

Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions.

JAMA Netw Open. 2023 Aug 1;6(8):e2330320. doi: 10.1001/jamanetworkopen.2023.30320.

Accuracy and Reliability of Chatbot Responses to Physician Questions.

JAMA Netw Open. 2023 Oct 2;6(10):e2336483. doi: 10.1001/jamanetworkopen.2023.36483.

Doctor Versus Artificial Intelligence: Patient and Physician Evaluation of Large Language Model Responses to Rheumatology Patient Questions in a Cross-Sectional Study.

Arthritis Rheumatol. 2024 Mar;76(3):479-484. doi: 10.1002/art.42737. Epub 2024 Jan 18.

Physician Versus Large Language Model Chatbot Responses to Web-Based Questions From Autistic Patients in Chinese: Cross-Sectional Comparative Analysis.

J Med Internet Res. 2024 Apr 30;26:e54706. doi: 10.2196/54706.

Large Language Model-Based Responses to Patients' In-Basket Messages.

JAMA Netw Open. 2024 Jul 1;7(7):e2422399. doi: 10.1001/jamanetworkopen.2024.22399.

The doc versus the bot: A pilot study to assess the quality and accuracy of physician and chatbot responses to clinical questions in gynecologic oncology.

Gynecol Oncol Rep. 2024 Aug 8;55:101477. doi: 10.1016/j.gore.2024.101477. eCollection 2024 Oct.

"Doctor ChatGPT, Can You Help Me?" The Patient's Perspective: Cross-Sectional Study.

J Med Internet Res. 2024 Oct 1;26:e58831. doi: 10.2196/58831.

A Comparative Study of Responses to Retina Questions from Either Experts, Expert-Edited Large Language Models, or Expert-Edited Large Language Models Alone.

Ophthalmol Sci. 2024 Feb 6;4(4):100485. doi: 10.1016/j.xops.2024.100485. eCollection 2024 Jul-Aug.

引用本文的文献

Utilisation of AI-driven chatbots for perioperative health information seeking: a descriptive qualitative study of orthopaedic patients and family members.

BMJ Open. 2025 Sep 4;15(9):e099824. doi: 10.1136/bmjopen-2025-099824.

Development and evaluation of a lightweight large language model chatbot for medication enquiry.

PLOS Digit Health. 2025 Sep 4;4(9):e0000961. doi: 10.1371/journal.pdig.0000961. eCollection 2025 Sep.

ChatGPT's role in the rapidly evolving hematologic cancer landscape.

Future Sci OA. 2025 Dec;11(1):2546259. doi: 10.1080/20565623.2025.2546259. Epub 2025 Sep 3.

Promoting trust and intention to adopt health information generated by ChatGPT among healthcare customers: An empirical study.

Digit Health. 2025 Aug 28;11:20552076251374121. doi: 10.1177/20552076251374121. eCollection 2025 Jan-Dec.

Comparative evaluation of AI platforms "Google Gemini 2.5 Flash, Google Gemini 2.0 Flash, DeepSeek V3 and ChatGPT 4o" in solving multiple-choice questions from different subtopics of anatomy.

Surg Radiol Anat. 2025 Aug 30;47(1):193. doi: 10.1007/s00276-025-03707-8.

Token Probabilities to Mitigate Large Language Models Overconfidence in Answering Medical Questions: Quantitative Study.

J Med Internet Res. 2025 Aug 29;27:e64348. doi: 10.2196/64348.

The impact of prompting on ChatGPT's adherence to status epilepticus treatment guidelines.

Sci Rep. 2025 Aug 28;15(1):31712. doi: 10.1038/s41598-025-16902-9.

Beyond the Growth: A Registry-Based Analysis of Global Imbalances in Artificial Intelligence Clinical Trials.

Healthcare (Basel). 2025 Aug 16;13(16):2018. doi: 10.3390/healthcare13162018.

Identification and Categorization of the Top 100 Articles and the Future of Large Language Models: Thematic Analysis Using Bibliometric Analysis.

JMIR AI. 2025 Aug 27;4:e68603. doi: 10.2196/68603.

Reliability of Large Language Model-Based Chatbots Versus Clinicians as Sources of Information on Orthodontics: A Comparative Analysis.

Dent J (Basel). 2025 Jul 24;13(8):343. doi: 10.3390/dj13080343.

本文引用的文献

Large language models encode clinical knowledge.

Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.

Association Between Billing Patient Portal Messages as e-Visits and Patient Messaging Volume.

JAMA. 2023 Jan 24;329(4):339-342. doi: 10.1001/jama.2022.24710.

Association Between Electronic Health Record Time and Quality of Care Metrics in Primary Care.

JAMA Netw Open. 2022 Oct 3;5(10):e2237086. doi: 10.1001/jamanetworkopen.2022.37086.

Changes in Burnout and Satisfaction With Work-Life Integration in Physicians During the First 2 Years of the COVID-19 Pandemic.

Mayo Clin Proc. 2022 Dec;97(12):2248-2258. doi: 10.1016/j.mayocp.2022.09.002. Epub 2022 Sep 14.

The Electronic Health Record Inbox: Recommendations for Relief.

J Gen Intern Med. 2022 Nov;37(15):4002-4003. doi: 10.1007/s11606-022-07766-0. Epub 2022 Aug 29.

Assessing the impact of the COVID-19 pandemic on clinician ambulatory electronic health record use.

J Am Med Inform Assoc. 2022 Jan 29;29(3):453-460. doi: 10.1093/jamia/ocab268.

Ensuring Quality in the Era of Virtual Care.

JAMA. 2021 Feb 2;325(5):429-430. doi: 10.1001/jama.2020.24955.

Virtual Care, Telemedicine Visits, and Real Connection in the Era of COVID-19: Unforeseen Opportunity in the Face of Adversity.

JAMA. 2021 Feb 2;325(5):437-438. doi: 10.1001/jama.2020.27304.

Examining Peer-to-Peer and Patient-Provider Interactions on a Social Media Community Facilitating Ask the Doctor Services.

Proc Int AAAI Conf Weblogs Soc Media. 2020 Jun;14:464-475.

Clinical, Legal, and Ethical Aspects of Artificial Intelligence-Assisted Conversational Agents in Health Care.

JAMA. 2020 Aug 11;324(6):552-553. doi: 10.1001/jama.2020.2724.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的患者问题的回复。

Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.

机构信息

出版信息

IMPORTANCE

OBJECTIVE

RESULTS

CONCLUSIONS

重要性

目的

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献