Lee Kyu Hong, Lee Ro Woon, Kwon Ye Eun
Department of Radiology, College of Medicine, Inha University, Incheon 22212, Republic of Korea.
Diagnostics (Basel). 2023 Dec 30;14(1):90. doi: 10.3390/diagnostics14010090.
This study evaluates the diagnostic accuracy and clinical utility of two artificial intelligence (AI) techniques: Kakao Brain Artificial Neural Network for Chest X-ray Reading (KARA-CXR), an assistive technology developed using large-scale AI and large language models (LLMs), and ChatGPT, a well-known LLM. The study was conducted to validate the performance of the two technologies in chest X-ray reading and explore their potential applications in the medical imaging diagnosis domain. The study methodology consisted of randomly selecting 2000 chest X-ray images from a single institution's patient database, and two radiologists evaluated the readings provided by KARA-CXR and ChatGPT. The study used five qualitative factors to evaluate the readings generated by each model: accuracy, false findings, location inaccuracies, count inaccuracies, and hallucinations. Statistical analysis showed that KARA-CXR achieved significantly higher diagnostic accuracy compared to ChatGPT. In the 'Acceptable' accuracy category, KARA-CXR was rated at 70.50% and 68.00% by two observers, while ChatGPT achieved 40.50% and 47.00%. Interobserver agreement was moderate for both systems, with KARA at 0.74 and GPT4 at 0.73. For 'False Findings', KARA-CXR scored 68.00% and 68.50%, while ChatGPT scored 37.00% for both observers, with high interobserver agreements of 0.96 for KARA and 0.97 for GPT4. In 'Location Inaccuracy' and 'Hallucinations', KARA-CXR outperformed ChatGPT with significant margins. KARA-CXR demonstrated a non-hallucination rate of 75%, which is significantly higher than ChatGPT's 38%. The interobserver agreement was high for KARA (0.91) and moderate to high for GPT4 (0.85) in the hallucination category. In conclusion, this study demonstrates the potential of AI and large-scale language models in medical imaging and diagnostics. It also shows that in the chest X-ray domain, KARA-CXR has relatively higher accuracy than ChatGPT.
本研究评估了两种人工智能(AI)技术的诊断准确性和临床实用性:用于胸部X光阅读的Kakao Brain人工神经网络(KARA-CXR),这是一种使用大规模AI和大语言模型(LLM)开发的辅助技术,以及著名的LLM ChatGPT。该研究旨在验证这两种技术在胸部X光阅读中的性能,并探索它们在医学影像诊断领域的潜在应用。研究方法包括从单个机构的患者数据库中随机选择2000张胸部X光图像,两名放射科医生对KARA-CXR和ChatGPT提供的阅读结果进行评估。该研究使用五个定性因素来评估每个模型生成的阅读结果:准确性、假阳性结果、位置不准确、数量不准确和幻觉。统计分析表明,与ChatGPT相比,KARA-CXR的诊断准确性显著更高。在“可接受”的准确性类别中,两名观察者对KARA-CXR的评分分别为70.50%和68.00%,而ChatGPT的评分为40.50%和47.00%。两个系统的观察者间一致性为中等,KARA为0.74,GPT4为0.73。对于“假阳性结果”,KARA-CXR的得分分别为68.00%和68.50%,而ChatGPT两名观察者的得分均为37.00%,KARA的观察者间一致性较高,为0.96,GPT4为0.97。在“位置不准确”和“幻觉”方面,KARA-CXR明显优于ChatGPT。KARA-CXR的非幻觉率为75%,显著高于ChatGPT的38%。在幻觉类别中,KARA的观察者间一致性较高(0.91),GPT4为中等至高(0.85)。总之,本研究证明了AI和大规模语言模型在医学影像和诊断中的潜力。研究还表明,在胸部X光领域,KARA-CXR的准确性相对高于ChatGPT。