文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

Virtual Patients Using Large Language Models: Scalable, Contextualized Simulation of Clinician-Patient Dialogue With Feedback.

作者信息

Cook David A, Overgaard Joshua, Pankratz V Shane, Del Fiol Guilherme, Aakre Chris A

机构信息

Division of General Internal Medicine, Mayo Clinic College of Medicine and Science, Rochester, MN, United States.

Multidisciplinary Simulation Center, Mayo Clinic College of Medicine and Science, Rochester, MN, United States.

出版信息

J Med Internet Res. 2025 Apr 4;27:e68486. doi: 10.2196/68486.


DOI:10.2196/68486
PMID:39854611
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12008702/
Abstract

BACKGROUND: Virtual patients (VPs) are computer screen-based simulations of patient-clinician encounters. VP use is limited by cost and low scalability. OBJECTIVE: We aimed to show that VPs powered by large language models (LLMs) can generate authentic dialogues, accurately represent patient preferences, and provide personalized feedback on clinical performance. We also explored using LLMs to rate the quality of dialogues and feedback. METHODS: We conducted an intrinsic evaluation study rating 60 VP-clinician conversations. We used carefully engineered prompts to direct OpenAI's generative pretrained transformer (GPT) to emulate a patient and provide feedback. Using 2 outpatient medicine topics (chronic cough diagnosis and diabetes management), each with permutations representing different patient preferences, we created 60 conversations (dialogues plus feedback): 48 with a human clinician and 12 "self-chat" dialogues with GPT role-playing both the VP and clinician. Primary outcomes were dialogue authenticity and feedback quality, rated using novel instruments for which we conducted a validation study collecting evidence of content, internal structure (reproducibility), relations with other variables, and response process. Each conversation was rated by 3 physicians and by GPT. Secondary outcomes included user experience, bias, patient preferences represented in the dialogues, and conversation features that influenced authenticity. RESULTS: The average cost per conversation was US $0.51 for GPT-4.0-Turbo and US $0.02 for GPT-3.5-Turbo. Mean (SD) conversation ratings, maximum 6, were overall dialogue authenticity 4.7 (0.7), overall user experience 4.9 (0.7), and average feedback quality 4.7 (0.6). For dialogues created using GPT-4.0-Turbo, physician ratings of patient preferences aligned with intended preferences in 20 to 47 of 48 dialogues (42%-98%). Subgroup comparisons revealed higher ratings for dialogues using GPT-4.0-Turbo versus GPT-3.5-Turbo and for human-generated versus self-chat dialogues. Feedback ratings were similar for human-generated versus GPT-generated ratings, whereas authenticity ratings were lower. We did not perceive bias in any conversation. Dialogue features that detracted from authenticity included that GPT was verbose or used atypical vocabulary (93/180, 51.7% of conversations), was overly agreeable (n=56, 31%), repeated the question as part of the response (n=47, 26%), was easily convinced by clinician suggestions (n=35, 19%), or was not disaffected by poor clinician performance (n=32, 18%). For feedback, detractors included excessively positive feedback (n=42, 23%), failure to mention important weaknesses or strengths (n=41, 23%), or factual inaccuracies (n=39, 22%). Regarding validation of dialogue and feedback scores, items were meticulously developed (content evidence), and we confirmed expected relations with other variables (higher ratings for advanced LLMs and human-generated dialogues). Reproducibility was suboptimal, due largely to variation in LLM performance rather than rater idiosyncrasies. CONCLUSIONS: LLM-powered VPs can simulate patient-clinician dialogues, demonstrably represent patient preferences, and provide personalized performance feedback. This approach is scalable, globally accessible, and inexpensive. LLM-generated ratings of feedback quality are similar to human ratings.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/12008702/e826fb9793b5/jmir_v27i1e68486_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/12008702/e826fb9793b5/jmir_v27i1e68486_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/12008702/e826fb9793b5/jmir_v27i1e68486_fig1.jpg

相似文献

[1]
Virtual Patients Using Large Language Models: Scalable, Contextualized Simulation of Clinician-Patient Dialogue With Feedback.

J Med Internet Res. 2025-4-4

[2]
Creating virtual patients using large language models: scalable, global, and low cost.

Med Teach. 2025-1

[3]
Development of a GPT-4-Powered Virtual Simulated Patient and Communication Training Platform for Medical Students to Practice Discussing Abnormal Mammogram Results With Patients: Multiphase Study.

JMIR Form Res. 2025-4-17

[4]
A Language Model-Powered Simulated Patient With Automated Feedback for History Taking: Prospective Study.

JMIR Med Educ. 2024-8-16

[5]
Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis.

JMIR Cancer. 2025-4-7

[6]
A Generative Pretrained Transformer (GPT)-Powered Chatbot as a Simulated Patient to Practice History Taking: Prospective, Mixed Methods Study.

JMIR Med Educ. 2024-1-16

[7]
Patient-Representing Population's Perceptions of GPT-Generated Versus Standard Emergency Department Discharge Instructions: Randomized Blind Survey Assessment.

J Med Internet Res. 2024-8-2

[8]
Comparing Artificial Intelligence-Generated and Clinician-Created Personalized Self-Management Guidance for Patients With Knee Osteoarthritis: Blinded Observational Study.

J Med Internet Res. 2025-5-7

[9]
Large Language Models Versus Expert Clinicians in Crisis Prediction Among Telemental Health Patients: Comparative Study.

JMIR Ment Health. 2024-8-2

[10]
Virtual Patient Simulations Using Social Robotics Combined With Large Language Models for Clinical Reasoning Training in Medical Education: Mixed Methods Study.

J Med Internet Res. 2025-3-3

引用本文的文献

[1]
Development and Validation of a Large Language Model-Based System for Medical History-Taking Training: Prospective Multicase Study on Evaluation Stability, Human-AI Consistency, and Transparency.

JMIR Med Educ. 2025-8-29

[2]
A large language model digital patient system enhances ophthalmology history taking skills.

NPJ Digit Med. 2025-8-4

[3]
Synthetic Patient-Physician Conversations Simulated by Large Language Models: A Multi-Dimensional Evaluation.

Sensors (Basel). 2025-7-10

[4]
The feasibility of using generative artificial intelligence for history taking in virtual patients.

BMC Res Notes. 2025-2-24

本文引用的文献

[1]
Creating virtual patients using large language models: scalable, global, and low cost.

Med Teach. 2025-1

[2]
Exploring the Quality of Feedback in Entrustable Professional Activity Narratives Across 24 Residency Training Programs.

J Grad Med Educ. 2024-2

[3]
Using Natural Language Processing to Evaluate the Quality of Supervisor Narrative Comments in Competency-Based Medical Education.

Acad Med. 2024-5-1

[4]
Demystifying AI: Current State and Future Role in Medical Education Assessment.

Acad Med. 2024-4-1

[5]
A Qualitative Textual Analysis of Feedback Comments in ePortfolios: Quality and Alignment with the CanMEDS Roles.

Perspect Med Educ. 2023

[6]
Automated Patient Note Grading: Examining Scoring Reliability and Feasibility.

Acad Med. 2023-11-1

[7]
The McMaster Narrative Comment Rating Tool: Development and Initial Validity Evidence.

Teach Learn Med. 2025

[8]
Management reasoning and patient-clinician interactions: Insights from shared decision-making and simulated outpatient encounters.

Med Teach. 2023-9

[9]
Five sources of bias in natural language processing.

Lang Linguist Compass. 2021-8

[10]
Management Reasoning: Empirical Determination of Key Features and a Conceptual Model.

Acad Med. 2023-1-1

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索