Indrakanti Ashraya Kumar, Heierle Julian Elias, Münger Hannah, Koch Alma Teresa, Kaiser Philippe, Bach Michael, Fiehler Jens, Tsogkas Ioannis, Guzman Raphael, Mutke Matthias Anthony, Psychogios Marios
Department of Diagnostic and Interventional Neuroradiology, Clinic of Radiology and Nuclear Medicine, University Hospital Basel, Petersgraben 4, 4031, Basel, Switzerland.
Clinic of Radiology and Nuclear Medicine, University Hospital Basel, Petersgraben 4, 4031, Basel, Switzerland.
Clin Neuroradiol. 2025 Jul 31. doi: 10.1007/s00062-025-01538-z.
In outpatient settings, extensive patient records must frequently be reviewed under time constraints, making efficient extraction and summarization of key clinical information essential. Large language models (LLMs) are potentially useful for this task but require validation for clinical reliability. This study assesses OpenAI's GPT-4o for generating structured summaries to assist in neurovascular consultation preparation, aiming to increase efficiency by automating critical data extraction.
A prospective study was conducted from May to August 2024 at a tertiary care hospital, involving a total of 70 patients. Structured summaries were generated by GPT-4o using a predefined template. Extracted data were categorized into aneurysm-specific details, imaging summaries, and patient-specific clinical factors. Accuracy and completeness were assessed by clinicians, with performance measured using precision, recall, specificity, and accuracy.
High accuracy (≥ 0.96) was measured across most categories. In aneurysm-and patient-specific data, extraction performance varied based on stability over time. Aneurysm location and other stable details were extracted consistently, while changes in aneurysm size and medication lists showed variations. In rare cases, aneurysm details were misattributed to a different aneurysm within the same patient. Imaging summaries were generally concise and clinically useful, though their effectiveness declined when summarizing multiple prior studies.
Neurovascular patient data was effectively structured by GPT-4o, demonstrating high accuracy with minimal errors. While occasional misattributions like outdated information were observed, reliable citation of sources facilitated easy verification. These findings support integrating LLM-generated summaries into neurovascular consultations, with further optimization needed for temporal data tracking and on-premise implementation to address privacy concerns.
在门诊环境中,必须经常在时间限制下审查大量患者记录,因此高效提取和总结关键临床信息至关重要。大语言模型(LLMs)可能有助于完成这项任务,但需要对其临床可靠性进行验证。本研究评估了OpenAI的GPT-4o生成结构化总结以协助神经血管会诊准备的能力,旨在通过自动提取关键数据来提高效率。
2024年5月至8月在一家三级护理医院进行了一项前瞻性研究,共纳入70例患者。GPT-4o使用预定义模板生成结构化总结。提取的数据分为动脉瘤特定细节、影像总结和患者特定临床因素。由临床医生评估准确性和完整性,使用精确率、召回率、特异性和准确率来衡量性能。
大多数类别均测得较高的准确性(≥0.96)。在动脉瘤和患者特定数据中,提取性能因随时间的稳定性而异。动脉瘤位置和其他稳定细节被一致提取,而动脉瘤大小和用药清单的变化则存在差异。在极少数情况下,同一患者内的动脉瘤细节被错误归因于另一个动脉瘤。影像总结通常简洁且具有临床实用性,不过在总结多项既往研究时其有效性会下降。
GPT-4o有效地构建了神经血管患者数据,显示出高准确性且错误极少。虽然观察到偶尔会出现像过时信息这样的错误归因,但可靠的来源引用便于轻松核实。这些发现支持将大语言模型生成的总结整合到神经血管会诊中,对于时间数据跟踪和本地实施以解决隐私问题还需要进一步优化。