深度学习胸部X光解读模型的验证：整合大规模人工智能和大语言模型以与ChatGPT进行对比分析

Validation of a Deep Learning Chest X-ray Interpretation Model: Integrating Large-Scale AI and Large Language Models for Comparative Analysis with ChatGPT.

作者信息

Lee Kyu Hong, Lee Ro Woon, Kwon Ye Eun

机构信息

Department of Radiology, College of Medicine, Inha University, Incheon 22212, Republic of Korea.

出版信息

Diagnostics (Basel). 2023 Dec 30;14(1):90. doi: 10.3390/diagnostics14010090.

DOI:10.3390/diagnostics14010090

PMID:38201398

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10795741/

Abstract

This study evaluates the diagnostic accuracy and clinical utility of two artificial intelligence (AI) techniques: Kakao Brain Artificial Neural Network for Chest X-ray Reading (KARA-CXR), an assistive technology developed using large-scale AI and large language models (LLMs), and ChatGPT, a well-known LLM. The study was conducted to validate the performance of the two technologies in chest X-ray reading and explore their potential applications in the medical imaging diagnosis domain. The study methodology consisted of randomly selecting 2000 chest X-ray images from a single institution's patient database, and two radiologists evaluated the readings provided by KARA-CXR and ChatGPT. The study used five qualitative factors to evaluate the readings generated by each model: accuracy, false findings, location inaccuracies, count inaccuracies, and hallucinations. Statistical analysis showed that KARA-CXR achieved significantly higher diagnostic accuracy compared to ChatGPT. In the 'Acceptable' accuracy category, KARA-CXR was rated at 70.50% and 68.00% by two observers, while ChatGPT achieved 40.50% and 47.00%. Interobserver agreement was moderate for both systems, with KARA at 0.74 and GPT4 at 0.73. For 'False Findings', KARA-CXR scored 68.00% and 68.50%, while ChatGPT scored 37.00% for both observers, with high interobserver agreements of 0.96 for KARA and 0.97 for GPT4. In 'Location Inaccuracy' and 'Hallucinations', KARA-CXR outperformed ChatGPT with significant margins. KARA-CXR demonstrated a non-hallucination rate of 75%, which is significantly higher than ChatGPT's 38%. The interobserver agreement was high for KARA (0.91) and moderate to high for GPT4 (0.85) in the hallucination category. In conclusion, this study demonstrates the potential of AI and large-scale language models in medical imaging and diagnostics. It also shows that in the chest X-ray domain, KARA-CXR has relatively higher accuracy than ChatGPT.

摘要

本研究评估了两种人工智能（AI）技术的诊断准确性和临床实用性：用于胸部X光阅读的Kakao Brain人工神经网络（KARA-CXR），这是一种使用大规模AI和大语言模型（LLM）开发的辅助技术，以及著名的LLM ChatGPT。该研究旨在验证这两种技术在胸部X光阅读中的性能，并探索它们在医学影像诊断领域的潜在应用。研究方法包括从单个机构的患者数据库中随机选择2000张胸部X光图像，两名放射科医生对KARA-CXR和ChatGPT提供的阅读结果进行评估。该研究使用五个定性因素来评估每个模型生成的阅读结果：准确性、假阳性结果、位置不准确、数量不准确和幻觉。统计分析表明，与ChatGPT相比，KARA-CXR的诊断准确性显著更高。在“可接受”的准确性类别中，两名观察者对KARA-CXR的评分分别为70.50%和68.00%，而ChatGPT的评分为40.50%和47.00%。两个系统的观察者间一致性为中等，KARA为0.74，GPT4为0.73。对于“假阳性结果”，KARA-CXR的得分分别为68.00%和68.50%，而ChatGPT两名观察者的得分均为37.00%，KARA的观察者间一致性较高，为0.96，GPT4为0.97。在“位置不准确”和“幻觉”方面，KARA-CXR明显优于ChatGPT。KARA-CXR的非幻觉率为75%，显著高于ChatGPT的38%。在幻觉类别中，KARA的观察者间一致性较高（0.91），GPT4为中等至高（0.85）。总之，本研究证明了AI和大规模语言模型在医学影像和诊断中的潜力。研究还表明，在胸部X光领域，KARA-CXR的准确性相对高于ChatGPT。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

深度学习胸部X光解读模型的验证：整合大规模人工智能和大语言模型以与ChatGPT进行对比分析

Validation of a Deep Learning Chest X-ray Interpretation Model: Integrating Large-Scale AI and Large Language Models for Comparative Analysis with ChatGPT.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

深度学习胸部X光解读模型的验证：整合大规模人工智能和大语言模型以与ChatGPT进行对比分析

Validation of a Deep Learning Chest X-ray Interpretation Model: Integrating Large-Scale AI and Large Language Models for Comparative Analysis with ChatGPT.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献