• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

对ChatGPT的医学建议进行(图灵)测试:调查研究。

Putting ChatGPT's Medical Advice to the (Turing) Test: Survey Study.

作者信息

Nov Oded, Singh Nina, Mann Devin

机构信息

Department of Technology Management, Tandon School of Engineering, New York University, New York, NY, United States.

Department of Population Health, Grossman School of Medicine, New York University, New York, NY, United States.

出版信息

JMIR Med Educ. 2023 Jul 10;9:e46939. doi: 10.2196/46939.

DOI:10.2196/46939
PMID:37428540
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10366957/
Abstract

BACKGROUND

Chatbots are being piloted to draft responses to patient questions, but patients' ability to distinguish between provider and chatbot responses and patients' trust in chatbots' functions are not well established.

OBJECTIVE

This study aimed to assess the feasibility of using ChatGPT (Chat Generative Pre-trained Transformer) or a similar artificial intelligence-based chatbot for patient-provider communication.

METHODS

A survey study was conducted in January 2023. Ten representative, nonadministrative patient-provider interactions were extracted from the electronic health record. Patients' questions were entered into ChatGPT with a request for the chatbot to respond using approximately the same word count as the human provider's response. In the survey, each patient question was followed by a provider- or ChatGPT-generated response. Participants were informed that 5 responses were provider generated and 5 were chatbot generated. Participants were asked-and incentivized financially-to correctly identify the response source. Participants were also asked about their trust in chatbots' functions in patient-provider communication, using a Likert scale from 1-5.

RESULTS

A US-representative sample of 430 study participants aged 18 and older were recruited on Prolific, a crowdsourcing platform for academic studies. In all, 426 participants filled out the full survey. After removing participants who spent less than 3 minutes on the survey, 392 respondents remained. Overall, 53.3% (209/392) of respondents analyzed were women, and the average age was 47.1 (range 18-91) years. The correct classification of responses ranged between 49% (192/392) to 85.7% (336/392) for different questions. On average, chatbot responses were identified correctly in 65.5% (1284/1960) of the cases, and human provider responses were identified correctly in 65.1% (1276/1960) of the cases. On average, responses toward patients' trust in chatbots' functions were weakly positive (mean Likert score 3.4 out of 5), with lower trust as the health-related complexity of the task in the questions increased.

CONCLUSIONS

ChatGPT responses to patient questions were weakly distinguishable from provider responses. Laypeople appear to trust the use of chatbots to answer lower-risk health questions. It is important to continue studying patient-chatbot interaction as chatbots move from administrative to more clinical roles in health care.

摘要

背景

目前正在试点使用聊天机器人来起草对患者问题的回复,但患者区分医疗服务提供者和聊天机器人回复的能力以及患者对聊天机器人功能的信任程度尚未明确。

目的

本研究旨在评估使用ChatGPT(聊天生成预训练变换器)或类似的基于人工智能的聊天机器人进行医患沟通的可行性。

方法

2023年1月进行了一项调查研究。从电子健康记录中提取了10次具有代表性的、非行政性的医患互动。将患者的问题输入ChatGPT,并要求聊天机器人以与医疗服务提供者回复大致相同的字数进行回复。在调查中,每个患者问题后面都跟着一个由医疗服务提供者或ChatGPT生成的回复。参与者被告知,其中5个回复是由医疗服务提供者生成的,5个是由聊天机器人生成的。参与者被要求——并给予经济激励——正确识别回复来源。参与者还被问及他们对聊天机器人在医患沟通中功能的信任程度,使用1-5的李克特量表。

结果

在学术研究众包平台Prolific上招募了430名年龄在18岁及以上的美国代表性研究参与者样本。共有426名参与者填写了完整的调查问卷。在剔除了在调查中花费时间少于3分钟的参与者后,剩下392名受访者。总体而言,接受分析的受访者中有53.3%(209/392)为女性,平均年龄为47.1岁(范围为18-91岁)。对于不同的问题,回复的正确分类率在49%(192/392)至85.7%(336/392)之间。平均而言,在65.5%(1284/1960)的情况下正确识别出聊天机器人的回复,在65.1%(1276/1960)的情况下正确识别出医疗服务提供者的回复。平均而言,患者对聊天机器人功能的信任程度回复呈弱阳性(平均李克特评分为3.4分(满分5分)),随着问题中与健康相关的任务复杂性增加,信任度降低。

结论

ChatGPT对患者问题的回复与医疗服务提供者的回复难以区分。外行人似乎信任使用聊天机器人来回答低风险的健康问题。随着聊天机器人在医疗保健中从行政角色转向更多的临床角色,继续研究患者与聊天机器人的互动非常重要。

相似文献

1
Putting ChatGPT's Medical Advice to the (Turing) Test: Survey Study.对ChatGPT的医学建议进行(图灵)测试:调查研究。
JMIR Med Educ. 2023 Jul 10;9:e46939. doi: 10.2196/46939.
2
The performance of artificial intelligence chatbot large language models to address skeletal biology and bone health queries.人工智能聊天机器人大型语言模型在解决骨骼生物学和骨骼健康问题方面的表现。
J Bone Miner Res. 2024 Mar 22;39(2):106-115. doi: 10.1093/jbmr/zjad007.
3
Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures.人工智能聊天机器人对改编自患者手册的青光眼问题的回答情况。
Cureus. 2024 Mar 23;16(3):e56766. doi: 10.7759/cureus.56766. eCollection 2024 Mar.
4
Comparison of the Audiological Knowledge of Three Chatbots: ChatGPT, Bing Chat, and Bard.三款聊天机器人的听力学知识比较:ChatGPT、必应聊天和巴德
Audiol Neurootol. 2024;29(6):457-463. doi: 10.1159/000538983. Epub 2024 May 6.
5
Assessing ChatGPT's Competency in Addressing Interdisciplinary Inquiries on Chatbot Uses in Sports Rehabilitation: Simulation Study.评估 ChatGPT 在解决有关运动康复中聊天机器人使用的跨学科问题方面的能力:模拟研究。
JMIR Med Educ. 2024 Aug 7;10:e51157. doi: 10.2196/51157.
6
Investigating the Impact of User Trust on the Adoption and Use of ChatGPT: Survey Analysis.研究用户信任对 ChatGPT 采用和使用的影响:调查分析。
J Med Internet Res. 2023 Jun 14;25:e47184. doi: 10.2196/47184.
7
ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.ChatGPT在德国妇产科考试中的表现——为人工智能强化医学教育和临床实践铺平道路。
Front Med (Lausanne). 2023 Dec 13;10:1296615. doi: 10.3389/fmed.2023.1296615. eCollection 2023.
8
Exploring the Possible Use of AI Chatbots in Public Health Education: Feasibility Study.探索人工智能聊天机器人在公共卫生教育中的潜在用途:可行性研究。
JMIR Med Educ. 2023 Nov 1;9:e51421. doi: 10.2196/51421.
9
Comparison of ChatGPT, Gemini, and Le Chat with physician interpretations of medical laboratory questions from an online health forum.比较 ChatGPT、Gemini 和 Le Chat 与医生对在线健康论坛上的医学实验室问题的解释。
Clin Chem Lab Med. 2024 May 29;62(12):2425-2434. doi: 10.1515/cclm-2024-0246. Print 2024 Nov 26.
10
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

引用本文的文献

1
Token Probabilities to Mitigate Large Language Models Overconfidence in Answering Medical Questions: Quantitative Study.减轻大语言模型在回答医学问题时过度自信的令牌概率:定量研究
J Med Internet Res. 2025 Aug 29;27:e64348. doi: 10.2196/64348.
2
GastroGPT: Development and controlled testing of a proof-of-concept customized clinical language model.胃语大模型:一种概念验证型定制临床语言模型的开发与对照测试
Endosc Int Open. 2025 Aug 6;13:a26372163. doi: 10.1055/a-2637-2163. eCollection 2025.
3
Harm Reduction Strategies for Thoughtful Use of Large Language Models in the Medical Domain: Perspectives for Patients and Clinicians.医学领域审慎使用大语言模型的危害降低策略:患者与临床医生的视角
J Med Internet Res. 2025 Jul 25;27:e75849. doi: 10.2196/75849.
4
A Systematic Review of User Attitudes Toward GenAI: Influencing Factors and Industry Perspectives.用户对生成式人工智能的态度的系统评价:影响因素与行业观点
J Intell. 2025 Jun 27;13(7):78. doi: 10.3390/jintelligence13070078.
5
Large language models in oncology: a review.肿瘤学中的大语言模型:综述
BMJ Oncol. 2025 May 15;4(1):e000759. doi: 10.1136/bmjonc-2025-000759. eCollection 2025.
6
Ethical Considerations for Generative Artificial Intelligence in Plastic Surgery.整形手术中生成式人工智能的伦理考量
Plast Reconstr Surg Glob Open. 2025 Jun 2;13(6):e6825. doi: 10.1097/GOX.0000000000006825. eCollection 2025 Jun.
7
Competencies of Large Language Models About Piriformis Syndrome: Quality, Accuracy, Completeness, and Readability Study.大语言模型关于梨状肌综合征的能力:质量、准确性、完整性和可读性研究。
HSS J. 2025 May 20:15563316251340697. doi: 10.1177/15563316251340697.
8
Impact of ChatGPT on Diabetes Mellitus Self-Management Among Patients in Saudi Arabia.ChatGPT对沙特阿拉伯患者糖尿病自我管理的影响
Cureus. 2025 Apr 7;17(4):e81855. doi: 10.7759/cureus.81855. eCollection 2025 Apr.
9
Clinical insights: A comprehensive review of language models in medicine.临床见解:医学领域语言模型的全面综述
PLOS Digit Health. 2025 May 8;4(5):e0000800. doi: 10.1371/journal.pdig.0000800. eCollection 2025 May.
10
Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment.解读智慧:评估ChatGPT在分析全景图像以进行第三磨牙评估时的准确性和可重复性。
Comput Struct Biotechnol J. 2025 Apr 11;28:141-147. doi: 10.1016/j.csbj.2025.04.010. eCollection 2025.

本文引用的文献

1
ChatGPT and Physicians' Malpractice Risk.ChatGPT与医生的医疗事故风险。
JAMA Health Forum. 2023 May 5;4(5):e231938. doi: 10.1001/jamahealthforum.2023.1938.
2
User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-sectional Survey Study.用户使用ChatGPT进行自我诊断及与健康相关目的的意图:横断面调查研究。
JMIR Hum Factors. 2023 May 17;10:e47564. doi: 10.2196/47564.
3
Automation Bias in Mammography: The Impact of Artificial Intelligence BI-RADS Suggestions on Reader Performance.乳腺 X 线摄影中的自动化偏倚:人工智能 BI-RADS 建议对读者性能的影响。
Radiology. 2023 May;307(4):e222176. doi: 10.1148/radiol.222176. Epub 2023 May 2.
4
Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的患者问题的回复。
JAMA Intern Med. 2023 Jun 1;183(6):589-596. doi: 10.1001/jamainternmed.2023.1838.
5
Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine.GPT-4作为医学人工智能聊天机器人的益处、局限性和风险
N Engl J Med. 2023 Mar 30;388(13):1233-1239. doi: 10.1056/NEJMsr2214184.
6
Role of Chat GPT in Public Health.Chat GPT 在公共卫生中的作用。
Ann Biomed Eng. 2023 May;51(5):868-869. doi: 10.1007/s10439-023-03172-7. Epub 2023 Mar 15.
7
Persuading Patients Using Rhetoric to Improve Artificial Intelligence Adoption: Experimental Study.利用修辞说服患者提高人工智能采用率:实验研究。
J Med Internet Res. 2023 Mar 13;25:e41430. doi: 10.2196/41430.
8
Appropriateness of Cardiovascular Disease Prevention Recommendations Obtained From a Popular Online Chat-Based Artificial Intelligence Model.从一个基于在线聊天的流行人工智能模型获取的心血管疾病预防建议的适宜性。
JAMA. 2023 Mar 14;329(10):842-844. doi: 10.1001/jama.2023.1044.
9
Stakeholder Perspectives of Clinical Artificial Intelligence Implementation: Systematic Review of Qualitative Evidence.利益相关者对临床人工智能实施的观点:定性证据的系统评价。
J Med Internet Res. 2023 Jan 10;25:e39742. doi: 10.2196/39742.
10
Prospective evaluation of smartwatch-enabled detection of left ventricular dysfunction.智能手表辅助检测左心室功能障碍的前瞻性评估。
Nat Med. 2022 Dec;28(12):2497-2503. doi: 10.1038/s41591-022-02053-1. Epub 2022 Nov 14.