Department of Diagnostic and Interventional Radiology, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Breisacher Str. 64, 79106, Freiburg, Germany.
Department of Stereotactic and Functional Neurosurgery, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany.
Sci Rep. 2023 Aug 30;13(1):14215. doi: 10.1038/s41598-023-41512-8.
While radiologists can describe a fracture's morphology and complexity with ease, the translation into classification systems such as the Arbeitsgemeinschaft Osteosynthesefragen (AO) Fracture and Dislocation Classification Compendium is more challenging. We tested the performance of generic chatbots and chatbots aware of specific knowledge of the AO classification provided by a vector-index and compared it to human readers. In the 100 radiological reports we created based on random AO codes, chatbots provided AO codes significantly faster than humans (mean 3.2 s per case vs. 50 s per case, p < .001) though not reaching human performance (max. chatbot performance of 86% correct full AO codes vs. 95% in human readers). In general, chatbots based on GPT 4 outperformed the ones based on GPT 3.5-Turbo. Further, we found that providing specific knowledge substantially enhances the chatbot's performance and consistency as the context-aware chatbot based on GPT 4 provided 71% consistent correct full AO codes for the compared to the 2% consistent correct full AO codes for the generic ChatGPT 4. This provides evidence, that refining and providing specific context to ChatGPT will be the next essential step in harnessing its power.
虽然放射科医生可以轻松描述骨折的形态和复杂性,但将其转换为 Arbeitsgemeinschaft Osteosynthesefragen (AO) 骨折和脱位分类纲要等分类系统则更具挑战性。我们测试了通用聊天机器人和具有特定 AO 分类知识的聊天机器人(由向量索引提供)的性能,并将其与人类读者进行了比较。在我们基于随机 AO 代码创建的 100 份放射学报告中,聊天机器人提供 AO 代码的速度明显快于人类(平均每个病例 3.2 秒,而每个病例 50 秒,p <.001),尽管未达到人类的表现(聊天机器人的最高性能为 86%的正确完整 AO 代码,而人类读者为 95%)。总的来说,基于 GPT 4 的聊天机器人优于基于 GPT 3.5-Turbo 的聊天机器人。此外,我们发现提供特定知识可以显著提高聊天机器人的性能和一致性,因为基于 GPT 4 的上下文感知聊天机器人提供了 71%的一致正确的完整 AO 代码,而通用的 ChatGPT 4 仅提供了 2%的一致正确的完整 AO 代码。这证明了,细化和提供特定的上下文信息给 ChatGPT 将是利用其能力的下一个重要步骤。