从文本到诊断：ChatGPT 在医学决策中的功效。

FROM TEXT TO DIAGNOSE: CHATGPT'S EFFICACY IN MEDICAL DECISION-MAKING.

机构信息

UZHHOROD NATIONAL UNIVERSITY, UZHHOROD, UKRAINE.

出版信息

Wiad Lek. 2023;76(11):2345-2350. doi: 10.36740/WLek202311101.

DOI:10.36740/WLek202311101

PMID:38112347

Abstract

OBJECTIVE

The aim: Evaluate the diagnostic capabilities of the ChatGPT in the field of medical diagnosis.

PATIENTS AND METHODS

Materials and methods: We utilized 50 clinical cases, employing Large Language Model ChatGPT-3.5. The experiment had three phases, each with a new chat setup. In the initial phase, ChatGPT received detailed clinical case descriptions, guided by a "Persona Pattern" prompt. In the second phase, cases with diagnostic errors were addressed by providing potential diagnoses for ChatGPT to choose from. The final phase assessed artificial intelligence's ability to mimic a medical practitioner's diagnostic process, with prompts limiting initial information to symptoms and history.

RESULTS

Results: In the initial phase, ChatGPT showed a 66.00% diagnostic accuracy, surpassing physicians by nearly 50%. Notably, in 11 cases requiring image inter¬pretation, ChatGPT struggled initially but achieved a correct diagnosis for four without added interpretations. In the second phase, ChatGPT demonstrated a remarkable 70.59% diagnostic accuracy, while physicians averaged 41.47%. Furthermore, the overall accuracy of Large Language Model in first and second phases together was 90.00%. In the third phase emulating real doctor decision-making, ChatGPT achieved a 46.00% success rate.

CONCLUSION

Conclusions: Our research underscores ChatGPT's strong potential in clinical medicine as a diagnostic tool, especially in structured scenarios. It emphasizes the need for supplementary data and the complexity of medical diagnosis. This contributes valuable insights to AI-driven clinical diagnostics, with a nod to the importance of prompt engineering techniques in ChatGPT's interaction with doctors.

摘要

目的

评估 ChatGPT 在医学诊断领域的诊断能力。

患者和方法

材料和方法：我们使用了 50 个临床病例，采用了大型语言模型 ChatGPT-3.5。实验分为三个阶段，每个阶段都有一个新的聊天设置。在第一阶段，ChatGPT 接收到详细的临床病例描述，由“角色模式”提示指导。在第二阶段，对于有诊断错误的病例，提供潜在的诊断供 ChatGPT 选择。最后一阶段评估人工智能模仿医生诊断过程的能力，提示将初始信息限制为症状和病史。

结果

结果：在第一阶段，ChatGPT 的诊断准确率为 66.00%，比医生高出近 50%。值得注意的是，在 11 个需要图像解释的病例中，ChatGPT 最初表现困难，但在没有额外解释的情况下，有 4 个病例得到了正确的诊断。在第二阶段，ChatGPT 的诊断准确率达到了惊人的 70.59%，而医生的平均准确率为 41.47%。此外，大型语言模型在第一和第二阶段的总体准确率为 90.00%。在第三阶段模拟真实医生的决策，ChatGPT 的成功率为 46.00%。