• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大语言模型和多模态模型在X射线图像中检测脊柱稳定系统的比较评估

Comparative Evaluation of Large Language and Multimodal Models in Detecting Spinal Stabilization Systems on X-Ray Images.

作者信息

Polis Bartosz, Zawadzka-Fabijan Agnieszka, Fabijan Robert, Kosińska Róża, Nowosławska Emilia, Fabijan Artur

机构信息

Department of Neurosurgery, Polish-Mother's Memorial Hospital Research Institute, 93-338 Lodz, Poland.

Department of Rehabilitation Medicine, Faculty of Health Sciences, Medical University of Lodz, 90-419 Lodz, Poland.

出版信息

J Clin Med. 2025 May 8;14(10):3282. doi: 10.3390/jcm14103282.

DOI:10.3390/jcm14103282
PMID:40429276
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12112668/
Abstract

Open-source AI models are increasingly applied in medical imaging, yet their effectiveness in detecting and classifying spinal stabilization systems remains underexplored. This study compares ChatGPT-4o (a large language model) and BiomedCLIP (a multimodal model) in their analysis of posturographic X-ray images (AP projection) to assess their accuracy in identifying the presence, type (growing vs. non-growing), and specific system (MCGR vs. PSF). A dataset of 270 X-ray images (93 without stabilization, 80 with MCGR, and 97 with PSF) was analyzed manually by neurosurgeons and evaluated using a three-stage AI-based questioning approach. Performance was assessed via classification accuracy, Gwet's Agreement Coefficient (AC1) for inter-rater reliability, and a two-tailed z-test for statistical significance ( < 0.05). The results indicate that GPT-4o demonstrates high accuracy in detecting spinal stabilization systems, achieving near-perfect recognition (97-100%) for the presence or absence of stabilization. However, its consistency is reduced when distinguishing complex growing-rod (MCGR) configurations, with agreement scores dropping significantly (AC1 = 0.32-0.50). In contrast, BiomedCLIP displays greater response consistency (AC1 = 1.00) but struggles with detailed classification, particularly in recognizing PSF (11% accuracy) and MCGR (4.16% accuracy). Sensitivity analysis revealed GPT-4o's superior stability in hierarchical classification tasks, while BiomedCLIP excelled in binary detection but showed performance deterioration as the classification complexity increased. These findings highlight GPT-4o's robustness in clinical AI-assisted diagnostics, particularly for detailed differentiation of spinal stabilization systems, whereas BiomedCLIP's precision may require further optimization to enhance its applicability in complex radiographic evaluations.

摘要

开源人工智能模型在医学成像中的应用越来越广泛,但其在检测和分类脊柱稳定系统方面的有效性仍未得到充分探索。本研究比较了ChatGPT-4o(一种大型语言模型)和BiomedCLIP(一种多模态模型)在分析姿势X线图像(前后位投影)时,评估它们识别脊柱稳定系统的存在、类型(生长型与非生长型)以及特定系统(MCGR与PSF)的准确性。一个包含270张X线图像的数据集(93张无脊柱稳定系统,80张有MCGR,97张有PSF)由神经外科医生进行人工分析,并使用基于人工智能的三阶段提问方法进行评估。通过分类准确率、评估者间可靠性的Gwet一致性系数(AC1)以及用于统计显著性的双尾z检验(<0.05)来评估性能。结果表明,GPT-4o在检测脊柱稳定系统方面具有较高的准确性,对于脊柱稳定系统的存在与否实现了近乎完美的识别(97%-100%)。然而,在区分复杂的生长棒(MCGR)构型时,其一致性有所降低,一致性得分显著下降(AC1 = 0.32-0.50)。相比之下,BiomedCLIP表现出更高的反应一致性(AC1 = 1.00),但在详细分类方面存在困难,尤其是在识别PSF(准确率11%)和MCGR(准确率4.16%)方面。敏感性分析显示,GPT-4o在分层分类任务中具有更好的稳定性,而BiomedCLIP在二元检测方面表现出色,但随着分类复杂性的增加,性能有所下降。这些发现凸显了GPT-4o在临床人工智能辅助诊断中的稳健性,特别是在脊柱稳定系统的详细鉴别方面,而BiomedCLIP的精度可能需要进一步优化,以提高其在复杂放射学评估中的适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bad/12112668/b6f54ce209bb/jcm-14-03282-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bad/12112668/db7eb4480e17/jcm-14-03282-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bad/12112668/8f706ad55d97/jcm-14-03282-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bad/12112668/b9b268e95b6c/jcm-14-03282-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bad/12112668/c0a6d3c75c84/jcm-14-03282-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bad/12112668/6b15fad69889/jcm-14-03282-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bad/12112668/b6f54ce209bb/jcm-14-03282-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bad/12112668/db7eb4480e17/jcm-14-03282-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bad/12112668/8f706ad55d97/jcm-14-03282-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bad/12112668/b9b268e95b6c/jcm-14-03282-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bad/12112668/c0a6d3c75c84/jcm-14-03282-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bad/12112668/6b15fad69889/jcm-14-03282-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bad/12112668/b6f54ce209bb/jcm-14-03282-g006.jpg

相似文献

1
Comparative Evaluation of Large Language and Multimodal Models in Detecting Spinal Stabilization Systems on X-Ray Images.大语言模型和多模态模型在X射线图像中检测脊柱稳定系统的比较评估
J Clin Med. 2025 May 8;14(10):3282. doi: 10.3390/jcm14103282.
2
Assessing the accuracy and clinical utility of GPT-4O in abnormal blood cell morphology recognition.评估GPT-4O在异常血细胞形态识别中的准确性和临床效用。
Digit Health. 2024 Nov 5;10:20552076241298503. doi: 10.1177/20552076241298503. eCollection 2024 Jan-Dec.
3
Patient Triage and Guidance in Emergency Departments Using Large Language Models: Multimetric Study.使用大语言模型在急诊科进行患者分诊和指导:多指标研究
J Med Internet Res. 2025 May 15;27:e71613. doi: 10.2196/71613.
4
ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis.ChatGPT-4 在 USMLE 学科和临床技能中的全能表现:比较分析。
JMIR Med Educ. 2024 Nov 6;10:e63430. doi: 10.2196/63430.
5
Artificial Intelligence in Scoliosis Classification: An Investigation of Language-Based Models.人工智能在脊柱侧弯分类中的应用:基于语言模型的研究
J Pers Med. 2023 Dec 9;13(12):1695. doi: 10.3390/jpm13121695.
6
Can Gpt-4o Accurately Diagnose Trauma X-Rays? A Comparative Study with Expert Evaluations.GPT-4o能否准确诊断创伤性X光片?与专家评估的比较研究。
J Emerg Med. 2025 Jun;73:71-79. doi: 10.1016/j.jemermed.2024.12.010. Epub 2025 Jan 4.
7
Comparison of medical history documentation efficiency and quality based on GPT-4o: a study on the comparison between residents and artificial intelligence.基于GPT-4o的病史记录效率与质量比较:住院医师与人工智能的比较研究
Front Med (Lausanne). 2025 May 14;12:1545730. doi: 10.3389/fmed.2025.1545730. eCollection 2025.
8
An Evaluation of the Performance of OpenAI-o1 and GPT-4o in the Japanese National Examination for Physical Therapists.OpenAI-o1和GPT-4o在日本物理治疗师国家考试中的表现评估
Cureus. 2025 Jan 6;17(1):e76989. doi: 10.7759/cureus.76989. eCollection 2025 Jan.
9
Generative pre-trained transformer 4o (GPT-4o) in solving text-based multiple response questions for European Diploma in Radiology (EDiR): a comparative study with radiologists.生成式预训练变换器4o(GPT-4o)用于解答欧洲放射学文凭(EDiR)基于文本的多项选择题:与放射科医生的对比研究
Insights Imaging. 2025 Mar 22;16(1):66. doi: 10.1186/s13244-025-01941-7.
10
High identification and positive-negative discrimination but limited detailed grading accuracy of ChatGPT-4o in knee osteoarthritis radiographs.ChatGPT-4o在膝关节骨关节炎X光片方面具有较高的识别能力和正负鉴别能力,但详细分级准确性有限。
Knee Surg Sports Traumatol Arthrosc. 2025 May;33(5):1911-1919. doi: 10.1002/ksa.12639. Epub 2025 Mar 7.

本文引用的文献

1
Clinical insights: A comprehensive review of language models in medicine.临床见解:医学领域语言模型的全面综述
PLOS Digit Health. 2025 May 8;4(5):e0000800. doi: 10.1371/journal.pdig.0000800. eCollection 2025 May.
2
A Comprehensive Survey of Foundation Models in Medicine.医学基础模型综合调查
IEEE Rev Biomed Eng. 2025 May 6;PP. doi: 10.1109/RBME.2025.3531360.
3
Generative Artificial Intelligence in Anatomic Pathology.解剖病理学中的生成式人工智能
Arch Pathol Lab Med. 2025 Apr 1;149(4):298-318. doi: 10.5858/arpa.2024-0215-RA.
4
Surgical Treatment of Early-Onset Scoliosis: Traditional Growing Rod vs. Magnetically Controlled Growing Rod vs. Vertical Expandable Prosthesis Titanium Ribs.早发性脊柱侧弯的外科治疗:传统生长棒与磁控生长棒及垂直可扩展人工钛肋骨的比较
J Clin Med. 2024 Dec 31;14(1):177. doi: 10.3390/jcm14010177.
5
Comparative Analysis of M4CXR, an LLM-Based Chest X-Ray Report Generation Model, and ChatGPT in Radiological Interpretation.基于大语言模型的胸部X光报告生成模型M4CXR与ChatGPT在放射学解读中的对比分析
J Clin Med. 2024 Nov 22;13(23):7057. doi: 10.3390/jcm13237057.
6
Vision-language models for medical report generation and visual question answering: a review.用于医学报告生成和视觉问答的视觉语言模型:综述
Front Artif Intell. 2024 Nov 19;7:1430984. doi: 10.3389/frai.2024.1430984. eCollection 2024.
7
Artificial Intelligence Diagnosing of Oral Lichen Planus: A Comparative Study.人工智能诊断口腔扁平苔藓:一项对比研究。
Bioengineering (Basel). 2024 Nov 18;11(11):1159. doi: 10.3390/bioengineering11111159.
8
Large-scale long-tailed disease diagnosis on radiology images.大规模长尾疾病在放射影像中的诊断。
Nat Commun. 2024 Nov 22;15(1):10147. doi: 10.1038/s41467-024-54424-6.
9
A Review of The Opportunities and Challenges with Large Language Models in Radiology: The Road Ahead.放射学中大型语言模型的机遇与挑战综述:前行之路
AJNR Am J Neuroradiol. 2024 Nov 21. doi: 10.3174/ajnr.A8589.
10
Applications of ChatGPT in Heart Failure Prevention, Diagnosis, Management, and Research: A Narrative Review.ChatGPT在心力衰竭预防、诊断、管理及研究中的应用:一项叙述性综述
Diagnostics (Basel). 2024 Oct 27;14(21):2393. doi: 10.3390/diagnostics14212393.