Suppr超能文献

通过提示工程和置信阈值优化GPT-4 Turbo在神经放射学中的诊断准确性。

Optimizing GPT-4 Turbo Diagnostic Accuracy in Neuroradiology through Prompt Engineering and Confidence Thresholds.

作者信息

Wada Akihiko, Akashi Toshiaki, Shih George, Hagiwara Akifumi, Nishizawa Mitsuo, Hayakawa Yayoi, Kikuta Junko, Shimoji Keigo, Sano Katsuhiro, Kamagata Koji, Nakanishi Atsushi, Aoki Shigeki

机构信息

Department of Radiology, Juntendo University Graduate School of Medicine, Tokyo 113-8421, Japan.

Clinical Radiology, Weill Cornell Medical College, New York, NY 10065, USA.

出版信息

Diagnostics (Basel). 2024 Jul 17;14(14):1541. doi: 10.3390/diagnostics14141541.

Abstract

BACKGROUND AND OBJECTIVES

Integrating large language models (LLMs) such as GPT-4 Turbo into diagnostic imaging faces a significant challenge, with current misdiagnosis rates ranging from 30-50%. This study evaluates how prompt engineering and confidence thresholds can improve diagnostic accuracy in neuroradiology.

METHODS

We analyze 751 neuroradiology cases from the American Journal of Neuroradiology using GPT-4 Turbo with customized prompts to improve diagnostic precision.

RESULTS

Initially, GPT-4 Turbo achieved a baseline diagnostic accuracy of 55.1%. By reformatting responses to list five diagnostic candidates and applying a 90% confidence threshold, the highest precision of the diagnosis increased to 72.9%, with the candidate list providing the correct diagnosis at 85.9%, reducing the misdiagnosis rate to 14.1%. However, this threshold reduced the number of cases that responded.

CONCLUSIONS

Strategic prompt engineering and high confidence thresholds significantly reduce misdiagnoses and improve the precision of the LLM diagnostic in neuroradiology. More research is needed to optimize these approaches for broader clinical implementation, balancing accuracy and utility.

摘要

背景与目的

将诸如GPT-4 Turbo等大语言模型整合到诊断成像中面临重大挑战,目前误诊率在30%至50%之间。本研究评估提示工程和置信阈值如何提高神经放射学的诊断准确性。

方法

我们使用GPT-4 Turbo和定制提示对《美国神经放射学杂志》中的751例神经放射学病例进行分析,以提高诊断精度。

结果

最初,GPT-4 Turbo的基线诊断准确率为55.1%。通过重新格式化回复以列出五个诊断候选结果,并应用90%的置信阈值,诊断的最高精度提高到72.9%,候选列表中正确诊断的比例为85.9%,误诊率降至14.1%。然而,这个阈值减少了有回复的病例数量。

结论

策略性提示工程和高置信阈值可显著减少误诊,并提高神经放射学中基于大语言模型诊断的精度。需要更多研究来优化这些方法,以便在更广泛的临床应用中平衡准确性和实用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa23/11276551/859630ce4db1/diagnostics-14-01541-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验