Nov Oded, Singh Nina, Mann Devin
Department of Technology Management, Tandon School of Engineering, New York University, New York, NY, United States.
Department of Population Health, Grossman School of Medicine, New York University, New York, NY, United States.
JMIR Med Educ. 2023 Jul 10;9:e46939. doi: 10.2196/46939.
Chatbots are being piloted to draft responses to patient questions, but patients' ability to distinguish between provider and chatbot responses and patients' trust in chatbots' functions are not well established.
This study aimed to assess the feasibility of using ChatGPT (Chat Generative Pre-trained Transformer) or a similar artificial intelligence-based chatbot for patient-provider communication.
A survey study was conducted in January 2023. Ten representative, nonadministrative patient-provider interactions were extracted from the electronic health record. Patients' questions were entered into ChatGPT with a request for the chatbot to respond using approximately the same word count as the human provider's response. In the survey, each patient question was followed by a provider- or ChatGPT-generated response. Participants were informed that 5 responses were provider generated and 5 were chatbot generated. Participants were asked-and incentivized financially-to correctly identify the response source. Participants were also asked about their trust in chatbots' functions in patient-provider communication, using a Likert scale from 1-5.
A US-representative sample of 430 study participants aged 18 and older were recruited on Prolific, a crowdsourcing platform for academic studies. In all, 426 participants filled out the full survey. After removing participants who spent less than 3 minutes on the survey, 392 respondents remained. Overall, 53.3% (209/392) of respondents analyzed were women, and the average age was 47.1 (range 18-91) years. The correct classification of responses ranged between 49% (192/392) to 85.7% (336/392) for different questions. On average, chatbot responses were identified correctly in 65.5% (1284/1960) of the cases, and human provider responses were identified correctly in 65.1% (1276/1960) of the cases. On average, responses toward patients' trust in chatbots' functions were weakly positive (mean Likert score 3.4 out of 5), with lower trust as the health-related complexity of the task in the questions increased.
ChatGPT responses to patient questions were weakly distinguishable from provider responses. Laypeople appear to trust the use of chatbots to answer lower-risk health questions. It is important to continue studying patient-chatbot interaction as chatbots move from administrative to more clinical roles in health care.
目前正在试点使用聊天机器人来起草对患者问题的回复,但患者区分医疗服务提供者和聊天机器人回复的能力以及患者对聊天机器人功能的信任程度尚未明确。
本研究旨在评估使用ChatGPT(聊天生成预训练变换器)或类似的基于人工智能的聊天机器人进行医患沟通的可行性。
2023年1月进行了一项调查研究。从电子健康记录中提取了10次具有代表性的、非行政性的医患互动。将患者的问题输入ChatGPT,并要求聊天机器人以与医疗服务提供者回复大致相同的字数进行回复。在调查中,每个患者问题后面都跟着一个由医疗服务提供者或ChatGPT生成的回复。参与者被告知,其中5个回复是由医疗服务提供者生成的,5个是由聊天机器人生成的。参与者被要求——并给予经济激励——正确识别回复来源。参与者还被问及他们对聊天机器人在医患沟通中功能的信任程度,使用1-5的李克特量表。
在学术研究众包平台Prolific上招募了430名年龄在18岁及以上的美国代表性研究参与者样本。共有426名参与者填写了完整的调查问卷。在剔除了在调查中花费时间少于3分钟的参与者后,剩下392名受访者。总体而言,接受分析的受访者中有53.3%(209/392)为女性,平均年龄为47.1岁(范围为18-91岁)。对于不同的问题,回复的正确分类率在49%(192/392)至85.7%(336/392)之间。平均而言,在65.5%(1284/1960)的情况下正确识别出聊天机器人的回复,在65.1%(1276/1960)的情况下正确识别出医疗服务提供者的回复。平均而言,患者对聊天机器人功能的信任程度回复呈弱阳性(平均李克特评分为3.4分(满分5分)),随着问题中与健康相关的任务复杂性增加,信任度降低。
ChatGPT对患者问题的回复与医疗服务提供者的回复难以区分。外行人似乎信任使用聊天机器人来回答低风险的健康问题。随着聊天机器人在医疗保健中从行政角色转向更多的临床角色,继续研究患者与聊天机器人的互动非常重要。