UZHHOROD NATIONAL UNIVERSITY, UZHHOROD, UKRAINE.
Wiad Lek. 2023;76(11):2345-2350. doi: 10.36740/WLek202311101.
The aim: Evaluate the diagnostic capabilities of the ChatGPT in the field of medical diagnosis.
Materials and methods: We utilized 50 clinical cases, employing Large Language Model ChatGPT-3.5. The experiment had three phases, each with a new chat setup. In the initial phase, ChatGPT received detailed clinical case descriptions, guided by a "Persona Pattern" prompt. In the second phase, cases with diagnostic errors were addressed by providing potential diagnoses for ChatGPT to choose from. The final phase assessed artificial intelligence's ability to mimic a medical practitioner's diagnostic process, with prompts limiting initial information to symptoms and history.
Results: In the initial phase, ChatGPT showed a 66.00% diagnostic accuracy, surpassing physicians by nearly 50%. Notably, in 11 cases requiring image inter¬pretation, ChatGPT struggled initially but achieved a correct diagnosis for four without added interpretations. In the second phase, ChatGPT demonstrated a remarkable 70.59% diagnostic accuracy, while physicians averaged 41.47%. Furthermore, the overall accuracy of Large Language Model in first and second phases together was 90.00%. In the third phase emulating real doctor decision-making, ChatGPT achieved a 46.00% success rate.
Conclusions: Our research underscores ChatGPT's strong potential in clinical medicine as a diagnostic tool, especially in structured scenarios. It emphasizes the need for supplementary data and the complexity of medical diagnosis. This contributes valuable insights to AI-driven clinical diagnostics, with a nod to the importance of prompt engineering techniques in ChatGPT's interaction with doctors.
评估 ChatGPT 在医学诊断领域的诊断能力。
材料和方法:我们使用了 50 个临床病例,采用了大型语言模型 ChatGPT-3.5。实验分为三个阶段,每个阶段都有一个新的聊天设置。在第一阶段,ChatGPT 接收到详细的临床病例描述,由“角色模式”提示指导。在第二阶段,对于有诊断错误的病例,提供潜在的诊断供 ChatGPT 选择。最后一阶段评估人工智能模仿医生诊断过程的能力,提示将初始信息限制为症状和病史。
结果:在第一阶段,ChatGPT 的诊断准确率为 66.00%,比医生高出近 50%。值得注意的是,在 11 个需要图像解释的病例中,ChatGPT 最初表现困难,但在没有额外解释的情况下,有 4 个病例得到了正确的诊断。在第二阶段,ChatGPT 的诊断准确率达到了惊人的 70.59%,而医生的平均准确率为 41.47%。此外,大型语言模型在第一和第二阶段的总体准确率为 90.00%。在第三阶段模拟真实医生的决策,ChatGPT 的成功率为 46.00%。
结论:我们的研究强调了 ChatGPT 在临床医学作为诊断工具的强大潜力,尤其是在结构化场景中。它强调了补充数据的必要性和医学诊断的复杂性。这为人工智能驱动的临床诊断提供了有价值的见解,同时也强调了提示工程技术在 ChatGPT 与医生交互中的重要性。