NYU Grossman School of Medicine, New York, New York.
NYU Stern School of Business, New York, New York.
JAMA Netw Open. 2024 Jul 1;7(7):e2422399. doi: 10.1001/jamanetworkopen.2024.22399.
Virtual patient-physician communications have increased since 2020 and negatively impacted primary care physician (PCP) well-being. Generative artificial intelligence (GenAI) drafts of patient messages could potentially reduce health care professional (HCP) workload and improve communication quality, but only if the drafts are considered useful.
To assess PCPs' perceptions of GenAI drafts and to examine linguistic characteristics associated with equity and perceived empathy.
DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional quality improvement study tested the hypothesis that PCPs' ratings of GenAI drafts (created using the electronic health record [EHR] standard prompts) would be equivalent to HCP-generated responses on 3 dimensions. The study was conducted at NYU Langone Health using private patient-HCP communications at 3 internal medicine practices piloting GenAI.
Randomly assigned patient messages coupled with either an HCP message or the draft GenAI response.
PCPs rated responses' information content quality (eg, relevance), using a Likert scale, communication quality (eg, verbosity), using a Likert scale, and whether they would use the draft or start anew (usable vs unusable). Branching logic further probed for empathy, personalization, and professionalism of responses. Computational linguistics methods assessed content differences in HCP vs GenAI responses, focusing on equity and empathy.
A total of 16 PCPs (8 [50.0%] female) reviewed 344 messages (175 GenAI drafted; 169 HCP drafted). Both GenAI and HCP responses were rated favorably. GenAI responses were rated higher for communication style than HCP responses (mean [SD], 3.70 [1.15] vs 3.38 [1.20]; P = .01, U = 12 568.5) but were similar to HCPs on information content (mean [SD], 3.53 [1.26] vs 3.41 [1.27]; P = .37; U = 13 981.0) and usable draft proportion (mean [SD], 0.69 [0.48] vs 0.65 [0.47], P = .49, t = -0.6842). Usable GenAI responses were considered more empathetic than usable HCP responses (32 of 86 [37.2%] vs 13 of 79 [16.5%]; difference, 125.5%), possibly attributable to more subjective (mean [SD], 0.54 [0.16] vs 0.31 [0.23]; P < .001; difference, 74.2%) and positive (mean [SD] polarity, 0.21 [0.14] vs 0.13 [0.25]; P = .02; difference, 61.5%) language; they were also numerically longer (mean [SD] word count, 90.5 [32.0] vs 65.4 [62.6]; difference, 38.4%), but the difference was not statistically significant (P = .07) and more linguistically complex (mean [SD] score, 125.2 [47.8] vs 95.4 [58.8]; P = .002; difference, 31.2%).
In this cross-sectional study of PCP perceptions of an EHR-integrated GenAI chatbot, GenAI was found to communicate information better and with more empathy than HCPs, highlighting its potential to enhance patient-HCP communication. However, GenAI drafts were less readable than HCPs', a significant concern for patients with low health or English literacy.
自 2020 年以来,虚拟医患沟通增加,并对初级保健医生(PCP)的健康产生负面影响。生成式人工智能(GenAI)起草的患者信息可能会减轻医疗保健专业人员(HCP)的工作量并提高沟通质量,但前提是草稿被认为是有用的。
评估 PCP 对 GenAI 草稿的看法,并研究与公平性和感知同理心相关的语言特征。
设计、设置和参与者:这是一项横断面质量改进研究,检验了以下假设:PCP 对 GenAI 草稿(使用电子健康记录[EHR]标准提示创建)的评分与 HCP 生成的响应在 3 个维度上相当。该研究在纽约大学朗格尼健康中心进行,使用 3 个内部医学实践中的私人医患沟通来试点 GenAI。
随机分配的患者信息,加上 HCP 信息或草稿 GenAI 响应。
PCP 使用李克特量表评估响应的信息内容质量(例如,相关性),使用李克特量表评估沟通质量(例如,冗长程度),并评估他们是否会使用草稿或重新开始(可用与不可用)。分支逻辑进一步探究了响应的同理心、个性化和专业性。计算语言学方法评估了 HCP 与 GenAI 响应之间的内容差异,重点关注公平性和同理心。
共有 16 名 PCP(8 名[50.0%]女性)审查了 344 条消息(175 条 GenAI 起草;169 条 HCP 起草)。GenAI 和 HCP 的响应都受到好评。GenAI 响应的沟通风格评分高于 HCP 响应(平均[标准差],3.70[1.15]比 3.38[1.20];P=0.01,U=12568.5),但在信息内容方面与 HCP 相似(平均[标准差],3.53[1.26]比 3.41[1.27];P=0.37;U=13981.0)和可用草稿比例(平均[标准差],0.69[0.48]比 0.65[0.47];P=0.49,t=-0.6842)。可用于 GenAI 的响应被认为比可用的 HCP 响应更有同理心(86 个中的 32 个[37.2%]比 79 个中的 13 个[16.5%];差异,125.5%),这可能归因于更主观(平均[标准差],0.54[0.16]比 0.31[0.23];P<0.001;差异,74.2%)和更积极的(平均[标准差]极性,0.21[0.14]比 0.13[0.25];P=0.02;差异,61.5%)语言;它们也在长度上略有增加(平均[标准差]字数,90.5[32.0]比 65.4[62.6];差异,38.4%),但差异无统计学意义(P=0.07),并且语言更复杂(平均[标准差]得分,125.2[47.8]比 95.4[58.8];P=0.002;差异,31.2%)。
在这项对 PCP 对 EHR 集成的 GenAI 聊天机器人的看法的横断面研究中,发现 GenAI 在传达信息方面比 HCP 更好且更有同理心,这突出了它增强医患沟通的潜力。然而,GenAI 草稿比 HCP 更难阅读,这对健康或英语读写能力较低的患者来说是一个重大问题。