• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

深度学习胸部X光解读模型的验证:整合大规模人工智能和大语言模型以与ChatGPT进行对比分析

Validation of a Deep Learning Chest X-ray Interpretation Model: Integrating Large-Scale AI and Large Language Models for Comparative Analysis with ChatGPT.

作者信息

Lee Kyu Hong, Lee Ro Woon, Kwon Ye Eun

机构信息

Department of Radiology, College of Medicine, Inha University, Incheon 22212, Republic of Korea.

出版信息

Diagnostics (Basel). 2023 Dec 30;14(1):90. doi: 10.3390/diagnostics14010090.

DOI:10.3390/diagnostics14010090
PMID:38201398
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10795741/
Abstract

This study evaluates the diagnostic accuracy and clinical utility of two artificial intelligence (AI) techniques: Kakao Brain Artificial Neural Network for Chest X-ray Reading (KARA-CXR), an assistive technology developed using large-scale AI and large language models (LLMs), and ChatGPT, a well-known LLM. The study was conducted to validate the performance of the two technologies in chest X-ray reading and explore their potential applications in the medical imaging diagnosis domain. The study methodology consisted of randomly selecting 2000 chest X-ray images from a single institution's patient database, and two radiologists evaluated the readings provided by KARA-CXR and ChatGPT. The study used five qualitative factors to evaluate the readings generated by each model: accuracy, false findings, location inaccuracies, count inaccuracies, and hallucinations. Statistical analysis showed that KARA-CXR achieved significantly higher diagnostic accuracy compared to ChatGPT. In the 'Acceptable' accuracy category, KARA-CXR was rated at 70.50% and 68.00% by two observers, while ChatGPT achieved 40.50% and 47.00%. Interobserver agreement was moderate for both systems, with KARA at 0.74 and GPT4 at 0.73. For 'False Findings', KARA-CXR scored 68.00% and 68.50%, while ChatGPT scored 37.00% for both observers, with high interobserver agreements of 0.96 for KARA and 0.97 for GPT4. In 'Location Inaccuracy' and 'Hallucinations', KARA-CXR outperformed ChatGPT with significant margins. KARA-CXR demonstrated a non-hallucination rate of 75%, which is significantly higher than ChatGPT's 38%. The interobserver agreement was high for KARA (0.91) and moderate to high for GPT4 (0.85) in the hallucination category. In conclusion, this study demonstrates the potential of AI and large-scale language models in medical imaging and diagnostics. It also shows that in the chest X-ray domain, KARA-CXR has relatively higher accuracy than ChatGPT.

摘要

本研究评估了两种人工智能(AI)技术的诊断准确性和临床实用性:用于胸部X光阅读的Kakao Brain人工神经网络(KARA-CXR),这是一种使用大规模AI和大语言模型(LLM)开发的辅助技术,以及著名的LLM ChatGPT。该研究旨在验证这两种技术在胸部X光阅读中的性能,并探索它们在医学影像诊断领域的潜在应用。研究方法包括从单个机构的患者数据库中随机选择2000张胸部X光图像,两名放射科医生对KARA-CXR和ChatGPT提供的阅读结果进行评估。该研究使用五个定性因素来评估每个模型生成的阅读结果:准确性、假阳性结果、位置不准确、数量不准确和幻觉。统计分析表明,与ChatGPT相比,KARA-CXR的诊断准确性显著更高。在“可接受”的准确性类别中,两名观察者对KARA-CXR的评分分别为70.50%和68.00%,而ChatGPT的评分为40.50%和47.00%。两个系统的观察者间一致性为中等,KARA为0.74,GPT4为0.73。对于“假阳性结果”,KARA-CXR的得分分别为68.00%和68.50%,而ChatGPT两名观察者的得分均为37.00%,KARA的观察者间一致性较高,为0.96,GPT4为0.97。在“位置不准确”和“幻觉”方面,KARA-CXR明显优于ChatGPT。KARA-CXR的非幻觉率为75%,显著高于ChatGPT的38%。在幻觉类别中,KARA的观察者间一致性较高(0.91),GPT4为中等至高(0.85)。总之,本研究证明了AI和大规模语言模型在医学影像和诊断中的潜力。研究还表明,在胸部X光领域,KARA-CXR的准确性相对高于ChatGPT。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c745/10795741/29329ba9dce6/diagnostics-14-00090-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c745/10795741/86fc5a50b5ed/diagnostics-14-00090-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c745/10795741/0f145155b39f/diagnostics-14-00090-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c745/10795741/7147177092f8/diagnostics-14-00090-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c745/10795741/4dc550b80831/diagnostics-14-00090-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c745/10795741/c2f651727a23/diagnostics-14-00090-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c745/10795741/e0d75e0fa37a/diagnostics-14-00090-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c745/10795741/ba0e14cf3b87/diagnostics-14-00090-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c745/10795741/29329ba9dce6/diagnostics-14-00090-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c745/10795741/86fc5a50b5ed/diagnostics-14-00090-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c745/10795741/0f145155b39f/diagnostics-14-00090-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c745/10795741/7147177092f8/diagnostics-14-00090-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c745/10795741/4dc550b80831/diagnostics-14-00090-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c745/10795741/c2f651727a23/diagnostics-14-00090-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c745/10795741/e0d75e0fa37a/diagnostics-14-00090-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c745/10795741/ba0e14cf3b87/diagnostics-14-00090-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c745/10795741/29329ba9dce6/diagnostics-14-00090-g008.jpg

相似文献

1
Validation of a Deep Learning Chest X-ray Interpretation Model: Integrating Large-Scale AI and Large Language Models for Comparative Analysis with ChatGPT.深度学习胸部X光解读模型的验证:整合大规模人工智能和大语言模型以与ChatGPT进行对比分析
Diagnostics (Basel). 2023 Dec 30;14(1):90. doi: 10.3390/diagnostics14010090.
2
AI-based computer-aided diagnostic system of chest digital tomography synthesis: Demonstrating comparative advantage with X-ray-based AI systems.基于人工智能的胸部数字断层合成计算机辅助诊断系统:与基于 X 射线的人工智能系统比较优势展示。
Comput Methods Programs Biomed. 2023 Oct;240:107643. doi: 10.1016/j.cmpb.2023.107643. Epub 2023 Jun 5.
3
Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study.评估 ChatGPT 在整个临床工作流程中的效用:开发和可用性研究。
J Med Internet Res. 2023 Aug 22;25:e48659. doi: 10.2196/48659.
4
Comparative Performance of ChatGPT 3.5 and GPT4 on Rhinology Standardized Board Examination Questions.ChatGPT 3.5与GPT4在鼻科学标准化委员会考试问题上的比较表现
OTO Open. 2024 Jun 27;8(2):e164. doi: 10.1002/oto2.164. eCollection 2024 Apr-Jun.
5
ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case-Based Questions.ChatGPT与医学顾问的对比:对耳鼻喉科基于病例问题回答的盲法评估
JMIR Med Educ. 2023 Dec 5;9:e49183. doi: 10.2196/49183.
6
ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.ChatGPT在德国妇产科考试中的表现——为人工智能强化医学教育和临床实践铺平道路。
Front Med (Lausanne). 2023 Dec 13;10:1296615. doi: 10.3389/fmed.2023.1296615. eCollection 2023.
7
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现:调查研究。
JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.
8
From jargon to clarity: Improving the readability of foot and ankle radiology reports with an artificial intelligence large language model.从行话到清晰明了:利用人工智能大语言模型提高足踝放射学报告的可读性
Foot Ankle Surg. 2024 Jun;30(4):331-337. doi: 10.1016/j.fas.2024.01.008. Epub 2024 Feb 5.
9
Comparing the Diagnostic Performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and Radiologists in Challenging Neuroradiology Cases.比较基于 GPT-4 的 ChatGPT、基于 GPT-4V 的 ChatGPT 和放射科医生在神经放射学挑战性病例中的诊断性能。
Clin Neuroradiol. 2024 Dec;34(4):779-787. doi: 10.1007/s00062-024-01426-y. Epub 2024 May 28.
10
Evaluating the Influence of Role-Playing Prompts on ChatGPT's Misinformation Detection Accuracy: Quantitative Study.评估角色扮演提示对 ChatGPT 错误信息检测准确率的影响:定量研究。
JMIR Infodemiology. 2024 Sep 26;4:e60678. doi: 10.2196/60678.

引用本文的文献

1
Diagnosis accuracy of ultrasonography for acute colonic diverticulitis: a diagnostic meta-analysis.超声检查对急性结肠憩室炎的诊断准确性:一项诊断性荟萃分析。
Abdom Radiol (NY). 2025 Sep 10. doi: 10.1007/s00261-025-05185-3.
2
Evaluation of GPT-4 Accuracy in the Interpretation of Medical Imaging: Potential Benefits, Limitations, and the Future.GPT-4在医学影像解读中的准确性评估:潜在益处、局限性及未来发展
Cureus. 2025 Jul 12;17(7):e87761. doi: 10.7759/cureus.87761. eCollection 2025 Jul.
3
Developing artificial intelligence tools for institutional review board pre-review: A pilot study on ChatGPT's accuracy and reproducibility.

本文引用的文献

1
The potential and pitfalls of using a large language model such as ChatGPT, GPT-4, or LLaMA as a clinical assistant.使用大型语言模型(如 ChatGPT、GPT-4 或 Llama)作为临床助手的潜力和陷阱。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1884-1891. doi: 10.1093/jamia/ocae184.
2
Evaluating capabilities of large language models: Performance of GPT-4 on surgical knowledge assessments.评估大语言模型的能力:GPT-4在外科知识评估中的表现。
Surgery. 2024 Apr;175(4):936-942. doi: 10.1016/j.surg.2023.12.014. Epub 2024 Jan 20.
3
Large language models propagate race-based medicine.
开发用于机构审查委员会预审查的人工智能工具:关于ChatGPT准确性和可重复性的初步研究。
PLOS Digit Health. 2025 Jun 30;4(6):e0000695. doi: 10.1371/journal.pdig.0000695. eCollection 2025 Jun.
4
Comparative Evaluation of Large Language and Multimodal Models in Detecting Spinal Stabilization Systems on X-Ray Images.大语言模型和多模态模型在X射线图像中检测脊柱稳定系统的比较评估
J Clin Med. 2025 May 8;14(10):3282. doi: 10.3390/jcm14103282.
5
Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment.解读智慧:评估ChatGPT在分析全景图像以进行第三磨牙评估时的准确性和可重复性。
Comput Struct Biotechnol J. 2025 Apr 11;28:141-147. doi: 10.1016/j.csbj.2025.04.010. eCollection 2025.
6
Integrating AI and Assistive Technologies in Healthcare: Insights from a Narrative Review of Reviews.将人工智能与辅助技术整合于医疗保健领域:基于综述之综述的见解
Healthcare (Basel). 2025 Mar 4;13(5):556. doi: 10.3390/healthcare13050556.
7
ChatGPT4's diagnostic accuracy in inpatient neurology: A retrospective cohort study.ChatGPT4在住院神经内科的诊断准确性:一项回顾性队列研究。
Heliyon. 2024 Dec 9;10(24):e40964. doi: 10.1016/j.heliyon.2024.e40964. eCollection 2024 Dec 30.
8
Artificial intelligence in fracture detection on radiographs: a literature review.人工智能在X线片骨折检测中的应用:文献综述
Jpn J Radiol. 2025 Apr;43(4):551-585. doi: 10.1007/s11604-024-01702-4. Epub 2024 Nov 14.
9
Revolution or risk?-Assessing the potential and challenges of GPT-4V in radiologic image interpretation.革命还是风险?——评估GPT-4V在放射影像解读中的潜力与挑战
Eur Radiol. 2025 Mar;35(3):1111-1121. doi: 10.1007/s00330-024-11115-6. Epub 2024 Oct 18.
10
Advancements in Artificial Intelligence for Medical Computer-Aided Diagnosis.用于医学计算机辅助诊断的人工智能进展。
Diagnostics (Basel). 2024 Jun 15;14(12):1265. doi: 10.3390/diagnostics14121265.
大语言模型传播基于种族的医学观念。
NPJ Digit Med. 2023 Oct 20;6(1):195. doi: 10.1038/s41746-023-00939-z.
4
ChatGPT in Radiology: The Advantages and Limitations of Artificial Intelligence for Medical Imaging Diagnosis.放射学中的ChatGPT:人工智能在医学影像诊断中的优势与局限
Cureus. 2023 Jul 6;15(7):e41435. doi: 10.7759/cureus.41435. eCollection 2023 Jul.
5
Assessing the utility of ChatGPT as an artificial intelligence-based large language model for information to answer questions on myopia.评估ChatGPT作为基于人工智能的大型语言模型在获取有关近视问题答案方面的实用性。
Ophthalmic Physiol Opt. 2023 Nov;43(6):1562-1570. doi: 10.1111/opo.13207. Epub 2023 Jul 21.
6
Artificial intelligence in healthcare: Complementing, not replacing, doctors and healthcare providers.医疗保健领域的人工智能:辅助医生和医疗服务提供者,而非取而代之。
Digit Health. 2023 Jul 2;9:20552076231186520. doi: 10.1177/20552076231186520. eCollection 2023 Jan-Dec.
7
The imperative for regulatory oversight of large language models (or generative AI) in healthcare.对医疗保健领域的大语言模型(或生成式人工智能)进行监管监督的必要性。
NPJ Digit Med. 2023 Jul 6;6(1):120. doi: 10.1038/s41746-023-00873-0.
8
Dr AI will see you now.人工智能医生现在将为您看病。
Clin Exp Ophthalmol. 2023 Jul;51(5):409-410. doi: 10.1111/ceo.14272.
9
AI-Based CXR First Reading: Current Limitations to Ensure Practical Value.基于人工智能的胸部X光首次阅片:确保实用价值的当前局限性
Diagnostics (Basel). 2023 Apr 16;13(8):1430. doi: 10.3390/diagnostics13081430.
10
Machine Learning Based Risk Prediction for Major Adverse Cardiovascular Events for ELGA-Authorized Clinics1.基于机器学习的 ELGA 授权诊所主要不良心血管事件风险预测 1 。
Stud Health Technol Inform. 2023 May 2;301:20-25. doi: 10.3233/SHTI230006.