Scheschenja Michael, Viniol Simon, Bastian Moritz B, Wessendorf Joel, König Alexander M, Mahnken Andreas H
Department of Diagnostic and Interventional Radiology, University Hospital Marburg, Philipps-University of Marburg, Baldingerstrasse 1, 35043, Marburg, DE, Germany.
Cardiovasc Intervent Radiol. 2024 Feb;47(2):245-250. doi: 10.1007/s00270-023-03563-2. Epub 2023 Oct 23.
This study explores the utility of the large language models, GPT-3 and GPT-4, for in-depth patient education prior to interventional radiology procedures. Further, differences in answer accuracy between the models were assessed.
A total of 133 questions related to three specific interventional radiology procedures (Port implantation, PTA and TACE) covering general information as well as preparation details, risks and complications and post procedural aftercare were compiled. Responses of GPT-3 and GPT-4 were assessed for their accuracy by two board-certified radiologists using a 5-point Likert scale. The performance difference between GPT-3 and GPT-4 was analyzed.
Both GPT-3 and GPT-4 responded with (5) "completely correct" (4) "very good" answers for the majority of questions ((5) 30.8% + (4) 48.1% for GPT-3 and (5) 35.3% + (4) 47.4% for GPT-4). GPT-3 and GPT-4 provided (3) "acceptable" responses 15.8% and 15.0% of the time, respectively. GPT-3 provided (2) "mostly incorrect" responses in 5.3% of instances, while GPT-4 had a lower rate of such occurrences, at just 2.3%. No response was identified as potentially harmful. GPT-4 was found to give significantly more accurate responses than GPT-3 (p = 0.043).
GPT-3 and GPT-4 emerge as relatively safe and accurate tools for patient education in interventional radiology. GPT-4 showed a slightly better performance. The feasibility and accuracy of these models suggest their promising role in revolutionizing patient care. Still, users need to be aware of possible limitations.
本研究探讨大语言模型GPT-3和GPT-4在介入放射学手术前进行深入患者教育的效用。此外,还评估了模型之间答案准确性的差异。
共收集了133个与三种特定介入放射学手术(端口植入、经皮腔内血管成形术和经动脉化疗栓塞术)相关的问题,涵盖一般信息以及准备细节、风险和并发症以及术后护理。两名获得董事会认证的放射科医生使用5点李克特量表评估GPT-3和GPT-4的回答准确性。分析了GPT-3和GPT-4之间的性能差异。
对于大多数问题,GPT-3和GPT-4的回答都是(5)“完全正确”(4)“非常好”(GPT-3为(5)30.8% + (4)48.1%,GPT-4为(5)35.3% + (4)47.4%)。GPT-3和GPT-4分别有15.8%和15.0%的时间提供(3)“可接受”的回答。GPT-3在5.3%的情况下提供(2)“大多不正确”的回答,而GPT-4的此类发生率较低,仅为2.3%。未发现有回答具有潜在危害。发现GPT-4的回答比GPT-3明显更准确(p = 0.043)。
GPT-3和GPT-4成为介入放射学患者教育中相对安全且准确的工具。GPT-4表现稍好。这些模型的可行性和准确性表明它们在彻底改变患者护理方面具有广阔前景。不过,用户需要意识到可能存在的局限性。