Suppr超能文献

基于语言模型的模拟患者与自动化反馈的病史采集:前瞻性研究。

A Language Model-Powered Simulated Patient With Automated Feedback for History Taking: Prospective Study.

机构信息

Tübingen Institute for Medical Education (TIME), Medical Faculty, University of Tübingen, Tübingen, Germany.

Department of Medical Development, Process and Quality Management, University Hospital Tübingen, Tübingen, Germany.

出版信息

JMIR Med Educ. 2024 Aug 16;10:e59213. doi: 10.2196/59213.

Abstract

BACKGROUND

Although history taking is fundamental for diagnosing medical conditions, teaching and providing feedback on the skill can be challenging due to resource constraints. Virtual simulated patients and web-based chatbots have thus emerged as educational tools, with recent advancements in artificial intelligence (AI) such as large language models (LLMs) enhancing their realism and potential to provide feedback.

OBJECTIVE

In our study, we aimed to evaluate the effectiveness of a Generative Pretrained Transformer (GPT) 4 model to provide structured feedback on medical students' performance in history taking with a simulated patient.

METHODS

We conducted a prospective study involving medical students performing history taking with a GPT-powered chatbot. To that end, we designed a chatbot to simulate patients' responses and provide immediate feedback on the comprehensiveness of the students' history taking. Students' interactions with the chatbot were analyzed, and feedback from the chatbot was compared with feedback from a human rater. We measured interrater reliability and performed a descriptive analysis to assess the quality of feedback.

RESULTS

Most of the study's participants were in their third year of medical school. A total of 1894 question-answer pairs from 106 conversations were included in our analysis. GPT-4's role-play and responses were medically plausible in more than 99% of cases. Interrater reliability between GPT-4 and the human rater showed "almost perfect" agreement (Cohen κ=0.832). Less agreement (κ<0.6) detected for 8 out of 45 feedback categories highlighted topics about which the model's assessments were overly specific or diverged from human judgement.

CONCLUSIONS

The GPT model was effective in providing structured feedback on history-taking dialogs provided by medical students. Although we unraveled some limitations regarding the specificity of feedback for certain feedback categories, the overall high agreement with human raters suggests that LLMs can be a valuable tool for medical education. Our findings, thus, advocate the careful integration of AI-driven feedback mechanisms in medical training and highlight important aspects when LLMs are used in that context.

摘要

背景

尽管问诊对于诊断医疗状况至关重要,但由于资源有限,教授和提供问诊技能反馈具有挑战性。因此,虚拟模拟患者和基于网络的聊天机器人已成为教育工具,而人工智能(AI)的最新进展,如大型语言模型(LLM),增强了它们的真实性和提供反馈的潜力。

目的

在我们的研究中,我们旨在评估生成式预训练转换器(GPT)4 模型在使用模拟患者对医学生问诊表现提供结构化反馈方面的有效性。

方法

我们进行了一项前瞻性研究,涉及进行问诊的医学生和 GPT 驱动的聊天机器人。为此,我们设计了一个聊天机器人来模拟患者的反应,并提供学生问诊全面性的即时反馈。分析学生与聊天机器人的互动,并将聊天机器人的反馈与人类评分者的反馈进行比较。我们测量了评分者间的可靠性,并进行了描述性分析,以评估反馈的质量。

结果

研究参与者大多处于医学三年级。我们的分析包括 106 次对话中的 1894 个问答对。GPT-4 的角色扮演和反应在超过 99%的情况下具有医学合理性。GPT-4 与人类评分者之间的评分者间可靠性显示出“几乎完美”的一致性(Cohen κ=0.832)。在 45 个反馈类别中的 8 个类别中,发现了一致性较低(κ<0.6),这突出了模型评估过于具体或与人类判断不一致的主题。

结论

GPT 模型在提供医学生问诊对话的结构化反馈方面是有效的。尽管我们发现某些反馈类别的反馈特异性存在一些局限性,但与人类评分者的总体高度一致性表明,大型语言模型可以成为医学教育的有价值工具。因此,我们的研究结果主张在医学培训中谨慎整合 AI 驱动的反馈机制,并突出在该背景下使用大型语言模型时的重要方面。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a659/11364946/3a65cef98f3c/mededu_v10i1e59213_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验