• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

“医生 ChatGPT,你能帮我吗?”患者视角:横断面研究。

"Doctor ChatGPT, Can You Help Me?" The Patient's Perspective: Cross-Sectional Study.

机构信息

Department of Trauma and Orthopedic Surgery, BG Klinik Ludwigshafen, Ludwigshafen am Rhein, Germany.

出版信息

J Med Internet Res. 2024 Oct 1;26:e58831. doi: 10.2196/58831.

DOI:10.2196/58831
PMID:39352738
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11480680/
Abstract

BACKGROUND

Artificial intelligence and the language models derived from it, such as ChatGPT, offer immense possibilities, particularly in the field of medicine. It is already evident that ChatGPT can provide adequate and, in some cases, expert-level responses to health-related queries and advice for patients. However, it is currently unknown how patients perceive these capabilities, whether they can derive benefit from them, and whether potential risks, such as harmful suggestions, are detected by patients.

OBJECTIVE

This study aims to clarify whether patients can get useful and safe health care advice from an artificial intelligence chatbot assistant.

METHODS

This cross-sectional study was conducted using 100 publicly available health-related questions from 5 medical specialties (trauma, general surgery, otolaryngology, pediatrics, and internal medicine) from a web-based platform for patients. Responses generated by ChatGPT-4.0 and by an expert panel (EP) of experienced physicians from the aforementioned web-based platform were packed into 10 sets consisting of 10 questions each. The blinded evaluation was carried out by patients regarding empathy and usefulness (assessed through the question: "Would this answer have helped you?") on a scale from 1 to 5. As a control, evaluation was also performed by 3 physicians in each respective medical specialty, who were additionally asked about the potential harm of the response and its correctness.

RESULTS

In total, 200 sets of questions were submitted by 64 patients (mean 45.7, SD 15.9 years; 29/64, 45.3% male), resulting in 2000 evaluated answers of ChatGPT and the EP each. ChatGPT scored higher in terms of empathy (4.18 vs 2.7; P<.001) and usefulness (4.04 vs 2.98; P<.001). Subanalysis revealed a small bias in terms of levels of empathy given by women in comparison with men (4.46 vs 4.14; P=.049). Ratings of ChatGPT were high regardless of the participant's age. The same highly significant results were observed in the evaluation of the respective specialist physicians. ChatGPT outperformed significantly in correctness (4.51 vs 3.55; P<.001). Specialists rated the usefulness (3.93 vs 4.59) and correctness (4.62 vs 3.84) significantly lower in potentially harmful responses from ChatGPT (P<.001). This was not the case among patients.

CONCLUSIONS

The results indicate that ChatGPT is capable of supporting patients in health-related queries better than physicians, at least in terms of written advice through a web-based platform. In this study, ChatGPT's responses had a lower percentage of potentially harmful advice than the web-based EP. However, it is crucial to note that this finding is based on a specific study design and may not generalize to all health care settings. Alarmingly, patients are not able to independently recognize these potential dangers.

摘要

背景

人工智能及其衍生的语言模型,如 ChatGPT,提供了巨大的可能性,特别是在医学领域。已经很明显,ChatGPT 可以为与健康相关的查询和患者建议提供足够的、甚至是专家级的回答。然而,目前尚不清楚患者如何看待这些能力,他们是否能从中受益,以及患者是否能发现潜在的风险,如有害建议。

目的

本研究旨在阐明患者是否能从人工智能聊天机器人助手那里获得有用且安全的医疗建议。

方法

这是一项横断面研究,使用了来自一个基于网络的患者平台的 5 个医学专业(创伤、普通外科、耳鼻喉科、儿科和内科)的 100 个公开的与健康相关的问题。ChatGPT-4.0 和该网络平台的一个由经验丰富的医生组成的专家小组(EP)生成的回答被包装成 10 组,每组包含 10 个问题。患者对同理心和有用性(通过问题评估:“这个回答对你有帮助吗?”,评分范围为 1 到 5)进行了盲法评估。作为对照,每个医学专业的 3 名医生也进行了评估,他们还被问到回答的潜在危害和正确性。

结果

共有 64 名患者(平均年龄 45.7±15.9 岁,29/64,45.3%为男性)提交了 200 套问题,从而得到了 2000 次 ChatGPT 和 EP 的回答。在同理心(4.18 比 2.7;P<.001)和有用性(4.04 比 2.98;P<.001)方面,ChatGPT 的得分更高。亚分析显示,女性在同理心方面的评分略高于男性(4.46 比 4.14;P=.049)。无论参与者的年龄如何,对 ChatGPT 的评价都很高。在各自专业医生的评估中也观察到了同样显著的结果。在正确性方面,ChatGPT 明显优于(4.51 比 3.55;P<.001)。专家们认为 ChatGPT 的回答在潜在的有害性方面(3.93 比 4.59)和正确性方面(4.62 比 3.84)明显较低(P<.001)。而患者则不然。

结论

结果表明,ChatGPT 在回答与健康相关的查询方面比医生更能帮助患者,至少在通过基于网络的平台提供书面建议方面是这样。在这项研究中,ChatGPT 的回答比基于网络的 EP 更少有潜在的有害建议。然而,必须指出的是,这一发现是基于特定的研究设计,可能不适用于所有的医疗保健环境。令人震惊的是,患者无法独立识别这些潜在的危险。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08a5/11480680/1424ff1ee24d/jmir_v26i1e58831_fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08a5/11480680/5b8b44cc3cc5/jmir_v26i1e58831_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08a5/11480680/f6582fc1d5cf/jmir_v26i1e58831_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08a5/11480680/157dd5f31abc/jmir_v26i1e58831_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08a5/11480680/cbe4a945a8d6/jmir_v26i1e58831_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08a5/11480680/612167f5398b/jmir_v26i1e58831_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08a5/11480680/c48c735a153e/jmir_v26i1e58831_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08a5/11480680/8b2fc0cfb4f4/jmir_v26i1e58831_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08a5/11480680/1424ff1ee24d/jmir_v26i1e58831_fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08a5/11480680/5b8b44cc3cc5/jmir_v26i1e58831_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08a5/11480680/f6582fc1d5cf/jmir_v26i1e58831_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08a5/11480680/157dd5f31abc/jmir_v26i1e58831_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08a5/11480680/cbe4a945a8d6/jmir_v26i1e58831_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08a5/11480680/612167f5398b/jmir_v26i1e58831_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08a5/11480680/c48c735a153e/jmir_v26i1e58831_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08a5/11480680/8b2fc0cfb4f4/jmir_v26i1e58831_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08a5/11480680/1424ff1ee24d/jmir_v26i1e58831_fig8.jpg

相似文献

1
"Doctor ChatGPT, Can You Help Me?" The Patient's Perspective: Cross-Sectional Study.“医生 ChatGPT,你能帮我吗?”患者视角:横断面研究。
J Med Internet Res. 2024 Oct 1;26:e58831. doi: 10.2196/58831.
2
Physician Versus Large Language Model Chatbot Responses to Web-Based Questions From Autistic Patients in Chinese: Cross-Sectional Comparative Analysis.中文自闭症患者网络问诊中,医生与大型语言模型聊天机器人回复的对比分析:横断面研究。
J Med Internet Res. 2024 Apr 30;26:e54706. doi: 10.2196/54706.
3
Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions.眼科医生与大型语言模型聊天机器人对在线患者眼部护理问题的回复比较。
JAMA Netw Open. 2023 Aug 1;6(8):e2330320. doi: 10.1001/jamanetworkopen.2023.30320.
4
Evaluating ChatGPT to test its robustness as an interactive information database of radiation oncology and to assess its responses to common queries from radiotherapy patients: A single institution investigation.评估ChatGPT以测试其作为放射肿瘤学交互式信息数据库的稳健性,并评估其对放疗患者常见问题的回答:一项单机构调查。
Cancer Radiother. 2024 Jun;28(3):258-264. doi: 10.1016/j.canrad.2023.11.005. Epub 2024 Jun 12.
5
ChatGPT vs. neurologists: a cross-sectional study investigating preference, satisfaction ratings and perceived empathy in responses among people living with multiple sclerosis.ChatGPT 与神经科医生:一项横断面研究,调查多发性硬化症患者对偏好、满意度评分和感知同理心的反应。
J Neurol. 2024 Jul;271(7):4057-4066. doi: 10.1007/s00415-024-12328-x. Epub 2024 Apr 3.
6
Evaluating the Accuracy of ChatGPT and Google BARD in Fielding Oculoplastic Patient Queries: A Comparative Study on Artificial versus Human Intelligence.评估 ChatGPT 和 Google BARD 在处理眼整形患者查询中的准确性:人工智能与人类智能的比较研究。
Ophthalmic Plast Reconstr Surg. 2024;40(3):303-311. doi: 10.1097/IOP.0000000000002567. Epub 2024 Jan 12.
7
Using ChatGPT for Clinical Practice and Medical Education: Cross-Sectional Survey of Medical Students' and Physicians' Perceptions.使用 ChatGPT 进行临床实践和医学教育:医学生和医生认知的横断面调查。
JMIR Med Educ. 2023 Dec 22;9:e50658. doi: 10.2196/50658.
8
Comparing ChatGPT and a Single Anesthesiologist's Responses to Common Patient Questions: An Exploratory Cross-Sectional Survey of a Panel of Anesthesiologists.比较 ChatGPT 和一位麻醉医生对常见患者问题的回答:对一组麻醉医生进行的探索性横断面调查。
J Med Syst. 2024 Aug 22;48(1):77. doi: 10.1007/s10916-024-02100-z.
9
Comparing the quality of ChatGPT- and physician-generated responses to patients' dermatology questions in the electronic medical record.比较 ChatGPT 和医生在电子病历中对患者皮肤科问题的回复质量。
Clin Exp Dermatol. 2024 Jun 25;49(7):715-718. doi: 10.1093/ced/llad456.
10
A Comparative Study of Responses to Retina Questions from Either Experts, Expert-Edited Large Language Models, or Expert-Edited Large Language Models Alone.专家、经过专家编辑的大语言模型或仅经过专家编辑的大语言模型对视网膜问题回答的比较研究。
Ophthalmol Sci. 2024 Feb 6;4(4):100485. doi: 10.1016/j.xops.2024.100485. eCollection 2024 Jul-Aug.

引用本文的文献

1
Evaluating the perspectives of ChatGPT and Gemini on glenohumeral osteoarthritis management.评估ChatGPT和Gemini在肩关节骨关节炎管理方面的观点。
JSES Int. 2025 Apr 10;9(4):1365-1370. doi: 10.1016/j.jseint.2025.03.011. eCollection 2025 Jul.
2
[Proposal for Responsible Use of Generative Artificial Intelligence in Medical Practice].[关于在医疗实践中负责任使用生成式人工智能的提案]
Rev Neurol. 2025 Aug 27;80(7):37503. doi: 10.31083/RN37503.
3
Accuracy, Clarity, and Comprehensiveness of ChatGPT Outputs for Commonly Asked Questions About Living Kidney Donation.

本文引用的文献

1
ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis.ChatGPT 联合 GPT-4 在诊断准确率上优于急诊科医生:回顾性分析。
J Med Internet Res. 2024 Jul 8;26:e56110. doi: 10.2196/56110.
2
Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.分诊表现比较:大型语言模型、ChatGPT 和未经训练的急诊医生:一项对比研究。
J Med Internet Res. 2024 Jun 14;26:e53297. doi: 10.2196/53297.
3
Quality and Dependability of ChatGPT and DingXiangYuan Forums for Remote Orthopedic Consultations: Comparative Analysis.
ChatGPT对常见活体肾捐赠问题的回答的准确性、清晰度和全面性。
Clin Transplant. 2025 Sep;39(9):e70303. doi: 10.1111/ctr.70303.
4
Comparison of Multiple State-of-the-Art Large Language Models for Patient Education Prior to CT and MRI Examinations.CT和MRI检查前用于患者教育的多种先进大语言模型的比较
J Pers Med. 2025 Jun 5;15(6):235. doi: 10.3390/jpm15060235.
5
Large language models in oncology: a review.肿瘤学中的大语言模型:综述
BMJ Oncol. 2025 May 15;4(1):e000759. doi: 10.1136/bmjonc-2025-000759. eCollection 2025.
6
Comparing orthodontic pre-treatment information provided by large language models.比较大语言模型提供的正畸治疗前信息。
BMC Oral Health. 2025 May 28;25(1):838. doi: 10.1186/s12903-025-06246-1.
7
Chinese generative AI models (DeepSeek and Qwen) rival ChatGPT-4 in ophthalmology queries with excellent performance in Arabic and English.中国生成式人工智能模型(通义千问和文心一言)在眼科问题查询方面可与ChatGPT-4相媲美,在阿拉伯语和英语方面表现出色。
Narra J. 2025 Apr;5(1):e2371. doi: 10.52225/narra.v5i1.2371. Epub 2025 Apr 8.
8
Advancing AI Horizons: Scientific Conversations on Tympanoplasty Postoperative Management.推进人工智能前沿:鼓膜成形术术后管理的科学探讨
J Int Adv Otol. 2025 Mar 25;21(2):1-2. doi: 10.5152/iao.2025.2419152.
9
Examining Healthcare Practitioners' Perceptions of Virtual Physicians, mHealth Applications, and Barriers to Adoption: Insights for Improving Patient Care and Digital Health Integration.审视医疗从业者对虚拟医生、移动健康应用程序的看法以及采用障碍:改善患者护理和数字健康整合的见解
Int J Gen Med. 2025 Apr 1;18:1865-1885. doi: 10.2147/IJGM.S515448. eCollection 2025.
ChatGPT 和丁香园论坛在远程骨科咨询中的质量和可靠性:比较分析。
J Med Internet Res. 2024 Mar 14;26:e50882. doi: 10.2196/50882.
4
Security Implications of AI Chatbots in Health Care.人工智能聊天机器人在医疗保健中的安全隐患。
J Med Internet Res. 2023 Nov 28;25:e47551. doi: 10.2196/47551.
5
Breaking barriers: can ChatGPT compete with a shoulder and elbow specialist in diagnosis and management?突破障碍:ChatGPT在诊断和治疗方面能否与肩肘专科医生相媲美?
JSES Int. 2023 Sep 4;7(6):2534-2541. doi: 10.1016/j.jseint.2023.07.018. eCollection 2023 Nov.
6
The Impact of Multimodal Large Language Models on Health Care's Future.多模态大型语言模型对医疗保健未来的影响。
J Med Internet Res. 2023 Nov 2;25:e52865. doi: 10.2196/52865.
7
Accuracy and Reliability of Chatbot Responses to Physician Questions.聊天机器人对医生提问回答的准确性和可靠性。
JAMA Netw Open. 2023 Oct 2;6(10):e2336483. doi: 10.1001/jamanetworkopen.2023.36483.
8
Changes in patient perceptions regarding ChatGPT-written explanations on lifestyle modifications for preventing urolithiasis recurrence.患者对ChatGPT编写的关于预防尿路结石复发的生活方式改变解释的认知变化。
Digit Health. 2023 Sep 28;9:20552076231203940. doi: 10.1177/20552076231203940. eCollection 2023 Jan-Dec.
9
Efficacy of AI Chats to Determine an Emergency: A Comparison Between OpenAI's ChatGPT, Google Bard, and Microsoft Bing AI Chat.人工智能聊天工具在判定紧急情况方面的效能:OpenAI的ChatGPT、谷歌巴德和微软必应人工智能聊天工具的比较
Cureus. 2023 Sep 18;15(9):e45473. doi: 10.7759/cureus.45473. eCollection 2023 Sep.
10
The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study.ChatGPT 在常见骨科疾病自我诊断中的潜力:探索性研究。
J Med Internet Res. 2023 Sep 15;25:e47621. doi: 10.2196/47621.