Gao Christina, Gheihman Galina, Kaplan Tamara, McCoy Liam G, Collins Luke C, Wenzel Tara, Paul Ashley, Reda Haatem, Stein Laura Katherine, Kimbaris Grace, Sutherland Harry W, Stacpoole Sybil, Milligan Tracey A, Goh Rudy, Bacchi Stephen
Department of Medicine, Adelaide Medical School, Adelaide, Australia.
Department of Neurology, Mass General Brigham, Boston, MA.
Neurol Educ. 2025 Sep 19;4(4):e200250. doi: 10.1212/NE9.0000000000200250. eCollection 2025 Dec.
Case reports are a fundamental part of medical literature and education. Artificial intelligence (AI) is increasingly influencing medical education and can potentially augment the delivery of the educational content in case reports. The aim of this study was to evaluate the feasibility of using AI, namely large language models (LLMs), to convert previously published into an interactive online format to facilitate case-based learning.
Three were converted into a free-text "screenplay" using the LLM Claude 3.5 Sonnet. These "screenplays" were then delivered in an interactive format through an online platform using GPT-4o. Two neurology fellows interrogated (prompted) the cases delivered by the online platform in a question-and-answer manner, seeking history, examination findings, and investigation results to arrive at a diagnosis and plan. These neurology fellows were not aware of the case report or screenplay content and asked questions in a manner that they would when evaluating a patient. A neurologist then reviewed each question-and-answer exchange for "screenplay" adherence and medical appropriateness. Results were analyzed with descriptive statistics.
The overall number of appropriate responses generated by the LLM was 206 of 210 (98.1%). There were 26 of 210 responses in which additional content was generated, all of which were medically plausible or consistent with the context of the case. The 4 errors that occurred were omissions of investigation results at the "screenplay" stage, which are amenable to manual correction. The omissions were the results of 3 unrevealing blood tests and 1 electroencephalogram result. None of these errors precluded the establishment of the diagnosis and completion of the case.
It is feasible to convert into an interactive question-and-answer format using LLMs. It should be noted that the nondeterministic nature of frontier LLMs and the potential for such LLM versions to change frequently are relevant considerations in making estimates of performance. Further studies investigating the impacts of this educational innovation are required.
病例报告是医学文献和教育的重要组成部分。人工智能(AI)对医学教育的影响日益增大,并且有可能增强病例报告中教育内容的传递。本研究的目的是评估使用人工智能,即大语言模型(LLMs),将先前发表的内容转换为交互式在线形式以促进基于病例的学习的可行性。
使用大语言模型Claude 3.5 Sonnet将三个病例报告转换为自由文本“剧本”。然后通过在线平台使用GPT - 4o以交互式形式呈现这些“剧本”。两名神经科住院医师以问答方式询问(提示)在线平台提供的病例,获取病史、检查结果和检查结果以得出诊断和治疗方案。这些神经科住院医师不知道病例报告或剧本内容,并且以评估患者时的方式提问。然后一名神经科医生审查每个问答交流是否符合“剧本”以及医学合理性。结果采用描述性统计进行分析。
大语言模型生成的适当回答总数为210个中的206个(98.1%)。在210个回答中有26个生成了额外内容,所有这些内容在医学上都是合理的或与病例背景一致。出现的4个错误是在“剧本”阶段遗漏了检查结果,这些都可以通过人工纠正。遗漏是3次无异常血液检查和1次脑电图结果造成的。这些错误均未妨碍做出诊断和完成病例。
使用大语言模型将病例报告转换为交互式问答形式是可行的。应当注意的是,前沿大语言模型的不确定性以及此类大语言模型版本可能频繁变化的可能性是在评估性能时的相关考虑因素。需要进一步研究调查这种教育创新的影响。