• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于骨科学术协会/骨科创伤协会(AO/OTA)标准评估ChatGPT在转子间骨折分类中的表现。

Evaluating ChatGPT's Performance in Classifying Pertrochanteric Fractures Based on Arbeitsgemeinschaft für Osteosynthesefragen/Orthopedic Trauma Association (AO/OTA) Standards.

作者信息

Noda Mitsuaki, Takahara Shunsuke, Hayashi Shinya, Inui Atsuyuki, Oe Keisuke, Matsushita Takehiko

机构信息

Orthopedics, Himeji Central Hospital, Himeji, JPN.

Orthopedics, Hyogo Prefectural Kakogawa Medical Center, Kakogawa, JPN.

出版信息

Cureus. 2025 Jan 27;17(1):e78068. doi: 10.7759/cureus.78068. eCollection 2025 Jan.

DOI:10.7759/cureus.78068
PMID:40018458
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11865862/
Abstract

Introduction Generative Pre-Training Transformer (ChatGPT) has become widely recognized for its capability to generate text, synthesize complex information, and perform a variety of tasks without requiring human specialists for data collection. The latest iteration, ChatGPT-4, is a large multimodal model capable of integrating both text and image inputs, rendering it particularly promising for medical applications. However, its efficacy in analyzing radiographic images remains largely unexplored. Aim This study aims to (i) address the lack of data on the accuracy of ChatGPT in radiographic fracture classification into stable or unstable under the revised Arbeitsgemeinschaft für Osteosynthesefragen/Orthopedic Trauma Association (AO/OTA) classification system, and this procedure is also performed by surgeons, and (ii) compare the agreement between surgeons or ChatGPT-based performance. The study hypothesizes that the use of ChatGPT would achieve moderate agreement with orthopedic surgeons. Materials and methods Patients diagnosed with pertrochanteric fractures were retrospectively collected. Patients with both preoperative two-directional plain radiographs and CT scans (3D-CT) images were conditioned for enrollment into the study. Two orthopedic surgeons (observer 1 and observer 2, respectively) and one resident (observer 3) were once assigned to dichotomized groups into A1 (stable) or A2 (unstable) based on AO/OTA classification using two-directional plain radiographs. Prior to the ChatGPT study, all the anteroposterior images trimmed at the fractured side, attached with figure names including gender, and age, were inputted into OpenAI ChatGPT-4. Radiological evaluation prompts were designed to initiate ChatGPT's classification analysis of the uploaded radiographic images. A single observer (MN) decided the classification patterns by examining 3D CT scan images as well as plain radiographs. This judgment of A1 (stable) and A2 (unstable) was set as a benchmark to mark the results of observers and ChatGPT based on plain radiographs. Results The cohort consisted of 29 males and 90 females, with a mean age of 87 years after the data exclusion. The fractures were classified into A1 (stable) and A2 (unstable) groups based on CT imaging. The A1 group included 50 patients (13 males, 37 females; mean age: 86.2 ± 7.8 years), while the A2 group included 69 patients (16 males, 53 females; mean age: 87.0 ± 7.9 years). Kappa values for fracture classification between plain radiographs evaluated by the three observers and ChatGPT, compared to the CT-based gold standard, showed fair to moderate agreement: Observer 1: 0.494 (95% CI: 0.337-0.650), Observer 2: 0.390 (95% CI: 0.227-0.553), Observer 3: 0.360 (95% CI: 0.198-0.521), and ChatGPT: 0.420 (95% CI: 0.255-0.585). ChatGPT demonstrated accuracy, sensitivity, specificity, and positive and negative predictable values comparable to the human observers, suggesting moderate reliability. Conclusion This study demonstrates that ChatGPT can classify pertrochanteric fractures into A1 (stable) and A2 (unstable) under the Revised AO/OTA Classification System. Its moderate agreement with CT-based assessments (κ = 0.420) is comparable to the performance of orthopedic surgeons. Moreover, ChatGPT is straightforward to integrate into clinical workflows, requiring minimal data collection for training.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e25f/11865862/3dab3c3d74a8/cureus-0017-00000078068-i04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e25f/11865862/97d041a69a48/cureus-0017-00000078068-i01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e25f/11865862/c42497215741/cureus-0017-00000078068-i02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e25f/11865862/562857889497/cureus-0017-00000078068-i03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e25f/11865862/3dab3c3d74a8/cureus-0017-00000078068-i04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e25f/11865862/97d041a69a48/cureus-0017-00000078068-i01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e25f/11865862/c42497215741/cureus-0017-00000078068-i02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e25f/11865862/562857889497/cureus-0017-00000078068-i03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e25f/11865862/3dab3c3d74a8/cureus-0017-00000078068-i04.jpg
摘要

引言 生成式预训练变换器(ChatGPT)因其能够生成文本、合成复杂信息以及在无需人类专家进行数据收集的情况下执行各种任务而得到广泛认可。其最新版本ChatGPT-4是一个大型多模态模型,能够整合文本和图像输入,这使其在医学应用方面具有特别广阔的前景。然而,其在分析X线影像方面的功效在很大程度上仍未得到探索。

目的 本研究旨在:(i)解决在修订的AO/OTA( Arbeitsgemeinschaft für Osteosynthesefragen/Orthopedic Trauma Association,接骨术问题研究协会/骨科创伤协会)分类系统下,ChatGPT对X线骨折分类为稳定或不稳定的准确性方面缺乏数据的问题,且该过程也由外科医生执行;(ii)比较外科医生或基于ChatGPT的表现之间的一致性。该研究假设使用ChatGPT将与骨科医生达成适度的一致性。

材料与方法 回顾性收集诊断为转子间骨折的患者。纳入研究的患者需同时具备术前双向X线平片和CT扫描(三维CT)图像。两名骨科医生(分别为观察者1和观察者2)和一名住院医师(观察者3)曾根据AO/OTA分类,使用双向X线平片将患者分为A1(稳定)或A2(不稳定)两组。在进行ChatGPT研究之前,将所有在骨折侧裁剪的前后位图像,附上包括性别和年龄的图像名称,输入到OpenAI ChatGPT-4中。设计了放射学评估提示,以启动ChatGPT对上传的X线影像的分类分析。一名观察者(MN)通过检查三维CT扫描图像以及X线平片来确定分类模式。将这种A1(稳定)和A2(不稳定)的判断作为基准,以标记基于X线平片的观察者和ChatGPT的结果。

结果 在数据排除后,该队列包括29名男性和90名女性,平均年龄为87岁。根据CT成像将骨折分为A1(稳定)和A2(不稳定)组。A1组包括50名患者(13名男性,37名女性;平均年龄:86.2±7.8岁),而A2组包括69名患者(16名男性,53名女性;平均年龄:87.0±7.9岁)。与基于CT的金标准相比,三位观察者和ChatGPT对X线平片骨折分类的Kappa值显示出中等至良好的一致性:观察者1:0.494(95%CI:0.337 - 0.650),观察者2:0.390(95%CI:0.227 - 0.553),观察者3:0.360(95%CI:0.198 - 0.521),ChatGPT:0.420(95%CI:0.255 - 0.585)。ChatGPT表现出与人类观察者相当的准确性、敏感性、特异性以及阳性和阴性预测值,表明具有中等可靠性。

结论 本研究表明,在修订的AO/OTA分类系统下,ChatGPT能够将转子间骨折分类为A1(稳定)和A2(不稳定)。其与基于CT的评估的中等一致性(κ = 0.420)与骨科医生的表现相当。此外,ChatGPT易于整合到临床工作流程中,训练所需的数据收集极少。

相似文献

1
Evaluating ChatGPT's Performance in Classifying Pertrochanteric Fractures Based on Arbeitsgemeinschaft für Osteosynthesefragen/Orthopedic Trauma Association (AO/OTA) Standards.基于骨科学术协会/骨科创伤协会(AO/OTA)标准评估ChatGPT在转子间骨折分类中的表现。
Cureus. 2025 Jan 27;17(1):e78068. doi: 10.7759/cureus.78068. eCollection 2025 Jan.
2
Posterior Protrusion Measures (PPM) as an Innovative Index in Classifying Plain Lateral Radiograph Images of Pertrochanteric Fracture Using the Revised AO Foundation/Orthopaedic Trauma Association (AO/OTA) Classification.后凸突出测量(PPM)作为一种创新指标,用于使用修订的AO基金会/骨创伤协会(AO/OTA)分类法对转子间骨折的普通侧位X线图像进行分类。
Cureus. 2022 Dec 24;14(12):e32898. doi: 10.7759/cureus.32898. eCollection 2022 Dec.
3
A Demographic Survey of Pertrochanteric Fractures Based on the Revised Arbeitsgemeinschaft für Osteosynthesefragen/Orthopedic Trauma Association (AO/OTA) Classification Using 3D CT Scan Images.基于使用三维CT扫描图像的修订版 Arbeitsgemeinschaft für Osteosynthesefragen/骨科创伤协会(AO/OTA)分类的转子周围骨折人口统计学调查
Cureus. 2023 Jan 9;15(1):e33572. doi: 10.7759/cureus.33572. eCollection 2023 Jan.
4
Inter-observer Agreement and Reproducibility of Pertrochanteric Fracture Classification Using Plain Radiograph Versus Computed Tomogram Images: A Study of 523 Patients.使用X线平片与计算机断层扫描图像对转子间骨折进行分类的观察者间一致性和可重复性:一项对523例患者的研究
Cureus. 2023 Nov 6;15(11):e48413. doi: 10.7759/cureus.48413. eCollection 2023 Nov.
5
Assessment of Usefulness of CT Scan in AO Classification of Intertrochanteric Fractures: A Prospective Observational Study.CT扫描在股骨转子间骨折AO分类中的应用价值评估:一项前瞻性观察研究。
Indian J Orthop. 2021 Oct 3;56(3):392-398. doi: 10.1007/s43465-021-00522-2. eCollection 2022 Mar.
6
Addition of 3D-CT evaluation to radiographic images and effect on diagnostic reliability of current 2018 AO/OTA classification of femoral trochanteric fractures.将 3D-CT 评估添加到影像学图像中对当前 2018AO/OTA 股骨转子间骨折分类的诊断可靠性的影响。
Injury. 2021 Nov;52(11):3363-3368. doi: 10.1016/j.injury.2021.09.031. Epub 2021 Sep 23.
7
Inter- and intra-observer variability of the AO/OTA classification for sternal fractures: a validation study.胸骨骨折的 AO/OTA 分类的组内和组间变异性:一项验证研究。
Arch Orthop Trauma Surg. 2020 Jun;140(6):735-739. doi: 10.1007/s00402-019-03289-2. Epub 2019 Nov 15.
8
Does the Instability of Pertrochanteric Fractures in the Elderly Correlate With Weakened Gluteal Muscles?老年股骨转子间骨折的不稳定性与臀肌减弱有关吗?
Cureus. 2024 Oct 22;16(10):e72159. doi: 10.7759/cureus.72159. eCollection 2024 Oct.
9
Inter- and intra-observer reliability of the new AO/OTA classification of proximal femur fractures.新的 AO/OTA 股骨近端骨折分类的组内和组间可靠性。
Injury. 2021 Jun;52(6):1434-1437. doi: 10.1016/j.injury.2020.10.067. Epub 2020 Oct 16.
10
The internal rotation traction radiograph does not improve the reliability in the AO classification system for pertrochanteric fractures. An inter- and intra-observer reliability assessment.内旋牵引 X 线片不能提高 AO 分型系统在股骨转子间骨折中的可靠性。一项观察者间和观察者内可靠性评估。
Injury. 2023 Nov;54 Suppl 6:110779. doi: 10.1016/j.injury.2023.05.010.

引用本文的文献

1
New frontiers in radiologic interpretation: evaluating the effectiveness of large language models in pneumothorax diagnosis.放射学解读的新前沿:评估大语言模型在气胸诊断中的有效性。
PLoS One. 2025 Sep 12;20(9):e0331962. doi: 10.1371/journal.pone.0331962. eCollection 2025.
2
Diagnostic Performance of ChatGPT-4o in Detecting Hip Fractures on Pelvic X-rays.ChatGPT-4o在骨盆X光片检测髋部骨折中的诊断性能
Cureus. 2025 Jun 24;17(6):e86654. doi: 10.7759/cureus.86654. eCollection 2025 Jun.

本文引用的文献

1
Performance of ChatGPT in Solving Questions From the Progress Test (Brazilian National Medical Exam): A Potential Artificial Intelligence Tool in Medical Practice.ChatGPT在解答进度测试(巴西国家医学考试)问题中的表现:医学实践中的一种潜在人工智能工具。
Cureus. 2024 Jul 19;16(7):e64924. doi: 10.7759/cureus.64924. eCollection 2024 Jul.
2
Evaluating ChatGPT's Capabilities on Orthopedic Training Examinations: An Analysis of New Image Processing Features.评估ChatGPT在骨科训练考试中的能力:对新图像处理功能的分析
Cureus. 2024 Mar 11;16(3):e55945. doi: 10.7759/cureus.55945. eCollection 2024 Mar.
3
Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration.
评估ChatGPT-4的诊断准确性:视觉数据整合的影响。
JMIR Med Inform. 2024 Apr 9;12:e55627. doi: 10.2196/55627.
4
Diagnostic power of ChatGPT 4 in distal radius fracture detection through wrist radiographs.通过腕关节 X 光片检测桡骨远端骨折的 ChatGPT 4 的诊断能力。
Arch Orthop Trauma Surg. 2024 May;144(5):2461-2467. doi: 10.1007/s00402-024-05298-2. Epub 2024 Apr 5.
5
Step into the era of large multimodal models: a pilot study on ChatGPT-4V(ision)'s ability to interpret radiological images.迈入大型多模态模型时代:ChatGPT-4V(ision)解读放射影像能力的初步研究。
Int J Surg. 2024 Jul 1;110(7):4096-4102. doi: 10.1097/JS9.0000000000001359.
6
Inter-observer Agreement and Reproducibility of Pertrochanteric Fracture Classification Using Plain Radiograph Versus Computed Tomogram Images: A Study of 523 Patients.使用X线平片与计算机断层扫描图像对转子间骨折进行分类的观察者间一致性和可重复性:一项对523例患者的研究
Cureus. 2023 Nov 6;15(11):e48413. doi: 10.7759/cureus.48413. eCollection 2023 Nov.
7
Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study.数据专家对适合机器学习的健康数据集的重要特征的看法:一项定性研究。
JAMA Netw Open. 2023 Dec 1;6(12):e2345892. doi: 10.1001/jamanetworkopen.2023.45892.
8
Artificial Intelligence in Healthcare: Perception and Reality.医疗保健中的人工智能:认知与现实
Cureus. 2023 Sep 20;15(9):e45594. doi: 10.7759/cureus.45594. eCollection 2023 Sep.
9
Performance of ChatGPT, human radiologists, and context-aware ChatGPT in identifying AO codes from radiology reports.ChatGPT、人类放射科医生和上下文感知 ChatGPT 在从放射报告中识别 AO 编码方面的表现。
Sci Rep. 2023 Aug 30;13(1):14215. doi: 10.1038/s41598-023-41512-8.
10
Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers.使用检测器和不知情的人类评审员,将ChatGPT生成的科学摘要与真实摘要进行比较。
NPJ Digit Med. 2023 Apr 26;6(1):75. doi: 10.1038/s41746-023-00819-6.