Lee Natalie S, Richards Nathan, Grandominico Jodi, Cronin Robert M, Hendricks Amanda K, Tripathi Ravi S, Jonas Daniel E
Department of Internal Medicine, College of Medicine, The Ohio State University, 2050 Kenny Rd, suite 2400, Columbus, OH, 43221, United States, 1 614-814-1361.
Health System Informatics, The Ohio State University Wexner Medical Center, Columbus, OH, United States.
JMIR Form Res. 2025 Jul 31;9:e71966. doi: 10.2196/71966.
There is growing interest in applying generative artificial intelligence (GenAI) to respond to electronic patient portal messages, particularly in primary care where message volumes are highest. However, evaluations of GenAI as an inbox communication tool are limited. Qualitative analysis of when and how often GenAI responses achieve communication goals can inform estimates of impact and guide continuous improvement.
This study aims to evaluate GenAI responses to primary care messages using a medical communication framework.
This was a descriptive quality improvement study of 201 GenAI replies to a purposively sampled, diverse pool of real primary care patient messages in a large midwestern academic medical center. Two physician reviewers (NSL and NR) used a hybrid deductive-inductive approach to qualitatively identify and define themes, guided by constructs from the "best practice" medical communication framework. After achieving thematic saturation, the reviewers assessed the presence or absence of identified communication themes, both independently and collaboratively. Discrepant observations were reconciled via discussion. Frequencies of identified themes were tallied.
Themes in strengths and limitations emerged across 5 communication domains. In the domain of rapport building, expressing respect and restating key phrases were strengths, while inappropriate or inadequate rapport building statements were limitations. For information gathering, questions that built toward a plan or elicited patient needs were strengths, while questions that were out of place or redundant were limitations. For information delivery, accurate content delivered clearly and professionally was a strength, but delivery of inaccurate content was an observed limitation. GenAI responses could facilitate next steps by outlining choices or providing instruction, but sometimes those next steps were inappropriate or premature. Finally, in responding to emotion, strengths were that emotions were named and validated, while inadequate or absent acknowledgment of emotion was a limitation. Overall, 26.4% (53/201) of all messages displayed communication strengths without limitations, 27.4% (55/201) had limitations without strengths, and the remaining 46.3% (93/201) had both. Strengths outnumbered limitations in rapport building (87/201, 43.3% vs 35/201, 17.4%) and facilitating next steps (73/201, 36.3% vs 39/201, 19.4%). Limitations outnumbered strengths in the remaining domains of information delivery (89/201, 44.3% vs 43/201, 21.4%), information gathering (60/201, 29.9% vs 43/201, 21.4%), and responding to emotion (7/201, 8.5% vs 9/201, 4.5%).
GenAI response quality on behalf of primary care physicians and advanced practice providers may vary by communication function. Expressions of respect or descriptions of common next steps may be appropriate, but gathering and delivering appropriate information, or responding to emotion, may be limited. While communication standards were often met, they were also often compromised. Understanding these strengths and limitations can inform decisions about whether, when, and how to apply GenAI as a tool for primary care inbox communication.
将生成式人工智能(GenAI)应用于回复电子患者门户网站消息的兴趣日益浓厚,尤其是在消息量最大的初级保健领域。然而,对GenAI作为收件箱通信工具的评估有限。对GenAI回复何时以及多频繁地实现通信目标进行定性分析,可以为影响评估提供信息,并指导持续改进。
本研究旨在使用医学通信框架评估GenAI对初级保健消息的回复。
这是一项描述性质量改进研究,对中西部一家大型学术医疗中心中201条GenAI对有目的抽样的、多样化的真实初级保健患者消息的回复进行了研究。两名医生评审员(NSL和NR)采用混合演绎-归纳方法,在“最佳实践”医学通信框架的构建指导下,定性识别和定义主题。在达到主题饱和后,评审员独立和协作评估已识别通信主题的存在与否。通过讨论协调存在差异的观察结果。统计已识别主题的出现频率。
在5个通信领域中出现了优势和局限性主题。在建立融洽关系领域,表达尊重和重述关键短语是优势,而建立融洽关系的表述不当或不足则是局限性。在信息收集方面,导向计划或引出患者需求的问题是优势,而不合适或多余的问题则是局限性。在信息传递方面,清晰、专业地传递准确内容是优势,但传递不准确内容是观察到的局限性。GenAI回复可以通过概述选择或提供指导来促进后续步骤,但有时这些后续步骤不合适或为时过早。最后,在回应情绪方面,优势在于识别并确认了情绪,而对情绪的确认不足或未确认则是局限性。总体而言,所有消息中有26.4%(53/201)显示出无局限性的通信优势,27.4%(55/201)有局限性而无优势,其余46.3%(93/201)既有优势又有局限性。在建立融洽关系(87/201,43.3%对35/201,17.4%)和促进后续步骤(73/201,36.3%对39/201,19.4%)方面,优势多于局限性。在其余的信息传递(89/201,见44.3%对43/201,21.4%)、信息收集(60/201,29.9%对43/201,21.4%)和回应情绪(7/201,8.5%对9/201,4.5%)领域,局限性多于优势。
代表初级保健医生和高级实践提供者的GenAI回复质量可能因通信功能而异。表达尊重或描述常见的后续步骤可能是合适的,但收集和提供适当信息或回应情绪可能会受到限制。虽然通常符合通信标准,但也常常受到损害。了解这些优势和局限性可以为关于是否、何时以及如何将GenAI用作初级保健收件箱通信工具的决策提供信息。