• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

能否出现一种新的肩峰肱骨距离测量方法?人工智能与医生的较量。

Could a New Method of Acromiohumeral Distance Measurement Emerge? Artificial Intelligence vs. Physician.

作者信息

Dede Burak Tayyip, Çakar İsa, Oğuz Muhammed, Alyanak Bülent, Bağcıer Fatih

机构信息

Department of Physical Medicine and Rehabilitation, Prof. Dr. Cemil Tascioglu City Hospital, Istanbul, Turkey.

Department of Radiology, Basaksehir Cam and Sakura City Hospital, Istanbul, Turkey.

出版信息

J Imaging Inform Med. 2025 Jul 25. doi: 10.1007/s10278-025-01614-3.

DOI:10.1007/s10278-025-01614-3
PMID:40715862
Abstract

The aim of this study was to evaluate the reliability of ChatGPT-4 measurement of acromiohumeral distance (AHD), a popular assessment in patients with shoulder pain. In this retrospective study, 71 registered shoulder magnetic resonance imaging (MRI) scans were included. AHD measurements were performed on a coronal oblique T1 sequence with a clear view of the acromion and humerus. Measurements were performed by an experienced radiologist twice at 3-day intervals and by ChatGPT-4 twice at 3-day intervals in different sessions. The first, second, and mean values of AHD measured by the physician were 7.6 ± 1.7, 7.5 ± 1.6, and 7.6 ± 1.7, respectively. The first, second, and mean values measured by ChatGPT-4 were 6.7 ± 0.8, 7.3 ± 1.1, and 7.1 ± 0.8, respectively. There was a significant difference between the physician and ChatGPT-4 between the first and mean measurements (p < 0.0001 and p = 0.009, respectively). However, there was no significant difference between the second measurements (p = 0.220). Intrarater reliability for the physician was excellent (ICC = 0.99); intrarater reliability for ChatGPT-4 was poor (ICC = 0.41). Interrater reliability was poor (ICC = 0.45). In conclusion, this study demonstrated that the reliability of ChatGPT-4 in AHD measurements is inferior to that of an experienced radiologist. This study may help improve the possible future contribution of large language models to medical science.

摘要

本研究的目的是评估ChatGPT-4测量肩峰肱骨距离(AHD)的可靠性,这是一种常用于肩痛患者的评估方法。在这项回顾性研究中,纳入了71例已注册的肩部磁共振成像(MRI)扫描。AHD测量在冠状斜位T1序列上进行,以清晰显示肩峰和肱骨。测量由一名经验丰富的放射科医生在不同时间段每隔3天进行两次,ChatGPT-4也在不同时间段每隔3天进行两次。医生测量的AHD的第一次、第二次和平均值分别为7.6±1.7、7.5±1.6和7.6±1.7。ChatGPT-4测量的第一次、第二次和平均值分别为6.7±0.8、7.3±1.1和7.1±0.8。医生和ChatGPT-4的第一次测量与平均测量之间存在显著差异(分别为p<0.0001和p=0.009)。然而,第二次测量之间没有显著差异(p=0.220)。医生的组内可靠性极佳(ICC=0.99);ChatGPT-4的组内可靠性较差(ICC=0.41)。组间可靠性较差(ICC=0.45)。总之,本研究表明ChatGPT-4在AHD测量中的可靠性低于经验丰富的放射科医生。本研究可能有助于提高未来大语言模型对医学科学的潜在贡献。

相似文献

1
Could a New Method of Acromiohumeral Distance Measurement Emerge? Artificial Intelligence vs. Physician.能否出现一种新的肩峰肱骨距离测量方法?人工智能与医生的较量。
J Imaging Inform Med. 2025 Jul 25. doi: 10.1007/s10278-025-01614-3.
2
Acromiohumeral distance measurement in rotator cuff tendinopathy: is there a reliable, clinically applicable method? A systematic review.肩袖肌腱病中肩峰肱骨头间距的测量:是否存在可靠、临床适用的方法?系统评价。
Br J Sports Med. 2015 Mar;49(5):298-305. doi: 10.1136/bjsports-2012-092063. Epub 2013 Jul 2.
3
Acquired Acromion Compromise, Including Thinning and Fragmentation, Is Not Associated With Poor Outcomes After Reverse Shoulder Arthroplasty.获得性肩峰下骨缺损,包括变薄和碎裂,与反肩关节置换术后的不良结果无关。
Clin Orthop Relat Res. 2024 Nov 1;482(11):2001-2013. doi: 10.1097/CORR.0000000000003131. Epub 2024 Jun 6.
4
Sertindole for schizophrenia.用于治疗精神分裂症的舍吲哚。
Cochrane Database Syst Rev. 2005 Jul 20;2005(3):CD001715. doi: 10.1002/14651858.CD001715.pub2.
5
Eliciting adverse effects data from participants in clinical trials.从临床试验参与者中获取不良反应数据。
Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.
6
[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].[容量与健康结果:来自系统评价和意大利医院数据评估的证据]
Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100.
7
Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study.评估ChatGPT-4作为产褥期乳腺炎管理在线门诊助手的效果:一项观察性研究的内容分析
JMIR Med Inform. 2025 Jul 24;13:e68980. doi: 10.2196/68980.
8
Sexual Harassment and Prevention Training性骚扰与预防培训
9
Pharmacological intervention for irritability, aggression, and self-injury in autism spectrum disorder (ASD).自闭症谱系障碍(ASD)中易怒、攻击行为和自我伤害的药物干预。
Cochrane Database Syst Rev. 2023 Oct 9;10(10):CD011769. doi: 10.1002/14651858.CD011769.pub2.
10
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.

本文引用的文献

1
Performance of GPT-4 with Vision on Text- and Image-based ACR Diagnostic Radiology In-Training Examination Questions.GPT-4 在基于文本和图像的放射科住院医师诊断考试中的表现。
Radiology. 2024 Sep;312(3):e240153. doi: 10.1148/radiol.240153.
2
Improved Acromiohumeral Distance Independently Predicts Better Outcomes After Arthroscopic Superior Capsular Reconstruction Graft Tears.肩峰下间隙距离改善可独立预测关节镜下肩胛上盂重建术后移植物撕裂的更好预后。
Arthroscopy. 2025 Jun;41(6):1720-1728. doi: 10.1016/j.arthro.2024.08.017. Epub 2024 Aug 28.
3
Evaluating Microsoft Bing with ChatGPT-4 for the assessment of abdominal computed tomography and magnetic resonance images.
使用ChatGPT-4评估微软必应在腹部计算机断层扫描和磁共振图像评估中的表现。
Diagn Interv Radiol. 2025 Apr 28;31(3):196-205. doi: 10.4274/dir.2024.232680. Epub 2024 Aug 19.
4
Evaluating Artificial Intelligence Competency in Education: Performance of ChatGPT-4 in the American Registry of Radiologic Technologists (ARRT) Radiography Certification Exam.评估教育领域的人工智能能力:ChatGPT-4在美国放射技师注册处(ARRT)放射摄影认证考试中的表现。
Acad Radiol. 2025 Feb;32(2):597-603. doi: 10.1016/j.acra.2024.08.009. Epub 2024 Aug 16.
5
Evaluation of ChatGPT as a diagnostic tool for medical learners and clinicians.评估 ChatGPT 作为医学学习者和临床医生的诊断工具。
PLoS One. 2024 Jul 31;19(7):e0307383. doi: 10.1371/journal.pone.0307383. eCollection 2024.
6
Evaluation of the Relationship between Acromiohumeral Distance and Supraspinatus Tendon Thickness Measured by Ultrasonography and Rotator Cuff Pathologies, Pain, and Function.超声测量肩峰下间隙与冈上肌腱厚度与肩袖病变、疼痛和功能的关系评估。
Acta Chir Orthop Traumatol Cech. 2024;91(3):164-169. doi: 10.55095/achot2024/024.
7
Assessing the Capability of ChatGPT, Google Bard, and Microsoft Bing in Solving Radiology Case Vignettes.评估ChatGPT、谷歌巴德和微软必应解决放射学病例 vignettes的能力。
Indian J Radiol Imaging. 2023 Dec 29;34(2):276-282. doi: 10.1055/s-0043-1777746. eCollection 2024 Apr.
8
Are acromiohumeral distance measurements on conventional radiographs reliable? A prospective study of inter-method agreement with ultrasonography, and assessment of observer variability.传统 X 线片上肩肱距离的测量可靠吗?超声法与传统 X 线片测量方法的前瞻性对比研究及观察者间变异性评估。
Jt Dis Relat Surg. 2024 Jan 1;35(1):62-71. doi: 10.52312/jdrs.2023.1288. Epub 2023 Oct 31.
9
Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.评估生成式 AI 大语言模型 ChatGPT、Google Bard 和 Microsoft Bing Chat 在支持循证牙科方面的性能:比较混合方法研究。
J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.
10
Evaluating GPT4 on Impressions Generation in Radiology Reports.评估GPT4在生成放射学报告印象方面的表现。
Radiology. 2023 Jun;307(5):e231259. doi: 10.1148/radiol.231259.