• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

深度学习模型自动检测椎体骨折的性能是否能达到人类专家的水平?

Can a Deep-learning Model for the Automated Detection of Vertebral Fractures Approach the Performance Level of Human Subspecialists?

机构信息

Institute of Data Science and Engineering, National Chiao Tung University, Hsinchu, Taiwan.

Center of Teaching and Learning Development, National Chiao Tung University, Hsinchu, Taiwan.

出版信息

Clin Orthop Relat Res. 2021 Jul 1;479(7):1598-1612. doi: 10.1097/CORR.0000000000001685.

DOI:10.1097/CORR.0000000000001685
PMID:33651768
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8208416/
Abstract

BACKGROUND

Vertebral fractures are the most common osteoporotic fractures in older individuals. Recent studies suggest that the performance of artificial intelligence is equal to humans in detecting osteoporotic fractures, such as fractures of the hip, distal radius, and proximal humerus. However, whether artificial intelligence performs as well in the detection of vertebral fractures on plain lateral spine radiographs has not yet been reported.

QUESTIONS/PURPOSES: (1) What is the accuracy, sensitivity, specificity, and interobserver reliability (kappa value) of an artificial intelligence model in detecting vertebral fractures, based on Genant fracture grades, using plain lateral spine radiographs compared with values obtained by human observers? (2) Do patients' clinical data, including the anatomic location of the fracture (thoracic or lumbar spine), T-score on dual-energy x-ray absorptiometry, or fracture grade severity, affect the performance of an artificial intelligence model? (3) How does the artificial intelligence model perform on external validation?

METHODS

Between 2016 and 2018, 1019 patients older than 60 years were treated for vertebral fractures in our institution. Seventy-eight patients were excluded because of missing CT or MRI scans (24% [19]), poor image quality in plain lateral radiographs of spines (54% [42]), multiple myeloma (5% [4]), and prior spine instrumentation (17% [13]). The plain lateral radiographs of 941 patients (one radiograph per person), with a mean age of 76 ± 12 years, and 1101 vertebral fractures between T7 and L5 were retrospectively evaluated for training (n = 565), validating (n = 188), and testing (n = 188) of an artificial intelligence deep-learning model. The gold standard for diagnosis (ground truth) of a vertebral fracture is the interpretation of the CT or MRI reports by a spine surgeon and a radiologist independently. If there were any disagreements between human observers, the corresponding CT or MRI images would be rechecked by them together to reach a consensus. For the Genant classification, the injured vertebral body height was measured in the anterior, middle, and posterior third. Fractures were classified as Grade 1 (< 25%), Grade 2 (26% to 40%), or Grade 3 (> 40%). The framework of the artificial intelligence deep-learning model included object detection, data preprocessing of radiographs, and classification to detect vertebral fractures. Approximately 90 seconds was needed to complete the procedure and obtain the artificial intelligence model results when applied clinically. The accuracy, sensitivity, specificity, interobserver reliability (kappa value), receiver operating characteristic curve, and area under the curve (AUC) were analyzed. The bootstrapping method was applied to our testing dataset and external validation dataset. The accuracy, sensitivity, and specificity were used to investigate whether fracture anatomic location or T-score in dual-energy x-ray absorptiometry report affected the performance of the artificial intelligence model. The receiver operating characteristic curve and AUC were used to investigate the relationship between the performance of the artificial intelligence model and fracture grade. External validation with a similar age population and plain lateral radiographs from another medical institute was also performed to investigate the performance of the artificial intelligence model.

RESULTS

The artificial intelligence model with ensemble method demonstrated excellent accuracy (93% [773 of 830] of vertebrae), sensitivity (91% [129 of 141]), and specificity (93% [644 of 689]) for detecting vertebral fractures of the lumbar spine. The interobserver reliability (kappa value) of the artificial intelligence performance and human observers for thoracic and lumbar vertebrae were 0.72 (95% CI 0.65 to 0.80; p < 0.001) and 0.77 (95% CI 0.72 to 0.83; p < 0.001), respectively. The AUCs for Grades 1, 2, and 3 vertebral fractures were 0.919, 0.989, and 0.990, respectively. The artificial intelligence model with ensemble method demonstrated poorer performance for discriminating normal osteoporotic lumbar vertebrae, with a specificity of 91% (260 of 285) compared with nonosteoporotic lumbar vertebrae, with a specificity of 95% (222 of 234). There was a higher sensitivity 97% (60 of 62) for detecting osteoporotic (dual-energy x-ray absorptiometry T-score ≤ -2.5) lumbar vertebral fractures, implying easier detection, than for nonosteoporotic vertebral fractures (83% [39 of 47]). The artificial intelligence model also demonstrated better detection of lumbar vertebral fractures compared with detection of thoracic vertebral fractures based on the external dataset using various radiographic techniques. Based on the dataset for external validation, the overall accuracy, sensitivity, and specificity on bootstrapping method were 89%, 83%, and 95%, respectively.

CONCLUSION

The artificial intelligence model detected vertebral fractures on plain lateral radiographs with high accuracy, sensitivity, and specificity, especially for osteoporotic lumbar vertebral fractures (Genant Grades 2 and 3). The rapid reporting of results using this artificial intelligence model may improve the efficiency of diagnosing vertebral fractures. The testing model is available at http://140.113.114.104/vght_demo/corr/. One or multiple plain lateral radiographs of the spine in the Digital Imaging and Communications in Medicine format can be uploaded to see the performance of the artificial intelligence model.

LEVEL OF EVIDENCE

Level II, diagnostic study.

摘要

背景

椎体骨折是老年人中最常见的骨质疏松性骨折。最近的研究表明,人工智能在检测髋部、桡骨远端和肱骨近端等部位的骨质疏松性骨折方面的表现与人类相当。然而,人工智能在检测普通侧位脊柱 X 光片上的椎体骨折方面的表现尚未得到报道。

问题/目的:(1) 与人类观察者相比,基于 Genant 骨折分级,使用普通侧位脊柱 X 光片,人工智能模型在检测椎体骨折方面的准确性、敏感度、特异度和观察者间可靠性(kappa 值)是多少?(2) 患者的临床数据,包括骨折的解剖位置(胸椎或腰椎)、双能 X 线吸收法的 T 评分或骨折严重程度,是否会影响人工智能模型的性能?(3) 人工智能模型在外部验证中的表现如何?

方法

在 2016 年至 2018 年间,我们机构治疗了 1019 名年龄在 60 岁以上的椎体骨折患者。由于缺少 CT 或 MRI 扫描(24%[19])、脊柱普通侧位 X 光片图像质量差(54%[42])、多发性骨髓瘤(5%[4])和脊柱内固定(17%[13]),排除了 78 名患者。回顾性评估了 941 名患者(每人 1 张 X 光片)的普通侧位 X 光片,这些患者的平均年龄为 76±12 岁,T7 至 L5 之间有 1101 个椎体骨折,用于训练(n=565)、验证(n=188)和测试(n=188)人工智能深度学习模型。诊断(金标准)椎体骨折的依据是脊柱外科医生和放射科医生对 CT 或 MRI 报告的独立解读。如果人类观察者之间存在任何分歧,将对相应的 CT 或 MRI 图像进行重新检查,以达成共识。对于 Genant 分级,测量前、中、后三分之一的损伤椎体高度。骨折分为 1 级(<25%)、2 级(26%至 40%)或 3 级(>40%)。人工智能深度学习模型的框架包括目标检测、X 光片的数据预处理和分类,以检测椎体骨折。当应用于临床时,该程序大约需要 90 秒的时间完成,并获得人工智能模型的结果。分析准确性、敏感度、特异度、观察者间可靠性(kappa 值)、接收者操作特征曲线和曲线下面积(AUC)。应用bootstrap 方法对我们的测试数据集和外部验证数据集进行分析。准确性、敏感度和特异度用于研究骨折解剖位置或双能 X 线吸收法报告中的 T 评分是否影响人工智能模型的性能。接收者操作特征曲线和 AUC 用于研究人工智能模型的性能与骨折严重程度的关系。还对来自另一家医疗机构的具有相似年龄人群的普通侧位 X 光片进行了外部验证,以研究人工智能模型的性能。

结果

具有集成方法的人工智能模型在检测腰椎椎体骨折方面表现出优异的准确性(93%[773 个椎体中的 830 个])、敏感度(91%[141 个骨折中的 129 个])和特异度(93%[689 个椎体中的 644 个])。人工智能性能和人类观察者对胸椎和腰椎的观察者间可靠性(kappa 值)分别为 0.72(95%CI 0.65 至 0.80;p<0.001)和 0.77(95%CI 0.72 至 0.83;p<0.001)。1 级、2 级和 3 级椎体骨折的 AUC 分别为 0.919、0.989 和 0.990。具有集成方法的人工智能模型在区分正常骨质疏松性腰椎椎体方面表现较差,特异性为 91%(285 个正常椎体中的 260 个),而非骨质疏松性腰椎椎体的特异性为 95%(234 个非骨质疏松性椎体中的 222 个)。检测骨质疏松性(双能 X 线吸收法 T 评分≤-2.5)腰椎椎体骨折的敏感度更高,为 97%(62 个骨折中的 60 个),这意味着更容易检测到骨折,而检测非骨质疏松性椎体骨折的敏感度为 83%(47 个骨折中的 39 个)。基于外部数据集,人工智能模型在检测腰椎椎体骨折方面的表现也优于检测胸椎椎体骨折,并且使用了各种放射技术。基于外部验证数据集,Bootstrap 方法的整体准确性、敏感度和特异性分别为 89%、83%和 95%。

结论

人工智能模型在检测普通侧位脊柱 X 光片上的椎体骨折方面具有较高的准确性、敏感度和特异性,特别是对骨质疏松性腰椎骨折(Genant 分级 2 和 3)。使用这种人工智能模型快速报告结果可能会提高诊断椎体骨折的效率。测试模型可在 http://140.113.114.104/vght_demo/corr/ 上获取。可以上传脊柱的数字成像和通信格式的 1 或多个普通侧位 X 光片,以查看人工智能模型的性能。

证据水平

II 级,诊断研究。

相似文献

1
Can a Deep-learning Model for the Automated Detection of Vertebral Fractures Approach the Performance Level of Human Subspecialists?深度学习模型自动检测椎体骨折的性能是否能达到人类专家的水平?
Clin Orthop Relat Res. 2021 Jul 1;479(7):1598-1612. doi: 10.1097/CORR.0000000000001685.
2
Ground truth generalizability affects performance of the artificial intelligence model in automated vertebral fracture detection on plain lateral radiographs of the spine.真实情况的可推广性会影响人工智能模型在脊柱正位侧位X线片自动检测椎体骨折中的性能。
Spine J. 2022 Apr;22(4):511-523. doi: 10.1016/j.spinee.2021.10.020. Epub 2021 Nov 1.
3
Accuracy of densitometric vertebral fracture assessment when performed by DXA technicians--a cross-sectional, multiobserver study.由双能X线骨密度仪技术人员进行的骨密度椎体骨折评估的准确性——一项横断面多观察者研究。
Osteoporos Int. 2016 Apr;27(4):1451-1458. doi: 10.1007/s00198-015-3395-4. Epub 2015 Nov 10.
4
Accuracy of spinal curvature assessed by a computer-assisted device and anthropometric indicators in discriminating vertebral fractures among individuals with back pain.计算机辅助设备评估的脊柱曲度和人体测量指标在鉴别腰痛患者的椎体骨折中的准确性。
Osteoporos Int. 2014 Jun;25(6):1727-34. doi: 10.1007/s00198-014-2680-y. Epub 2014 Mar 14.
5
Is Deep Learning On Par with Human Observers for Detection of Radiographically Visible and Occult Fractures of the Scaphoid?深度学习在检测桡骨隐匿性和显性骨折方面与人类观察者相当吗?
Clin Orthop Relat Res. 2020 Nov;478(11):2653-2659. doi: 10.1097/CORR.0000000000001318.
6
Subject-level spinal osteoporotic fracture prediction combining deep learning vertebral outputs and limited demographic data.基于深度学习椎体输出和有限人口统计学数据的脊柱骨质疏松性骨折的个体预测
Arch Osteoporos. 2024 Sep 10;19(1):87. doi: 10.1007/s11657-024-01433-z.
7
Reliability and accuracy of vertebral fracture assessment with densitometry compared to radiography in clinical practice.在临床实践中,与X线摄影相比,骨密度测定法评估椎体骨折的可靠性和准确性。
Osteoporos Int. 2006 Feb;17(2):281-9. doi: 10.1007/s00198-005-2010-5. Epub 2005 Sep 20.
8
Vertebral fracture assessment by dual X-ray absorptiometry: a valid tool to detect vertebral fractures in community-dwelling older adults in a population-based survey.双能 X 射线吸收仪评估椎体骨折:一种在基于人群的调查中检测社区居住的老年人群椎体骨折的有效工具。
Arthritis Care Res (Hoboken). 2013 May;65(5):809-15. doi: 10.1002/acr.21905.
9
Using Artificial Intelligence to Diagnose Osteoporotic Vertebral Fractures on Plain Radiographs.利用人工智能诊断 X 光平片上的骨质疏松性椎体骨折。
J Bone Miner Res. 2023 Sep;38(9):1278-1287. doi: 10.1002/jbmr.4879. Epub 2023 Aug 2.
10
Vertebral Body Compression Fractures and Bone Density: Automated Detection and Classification on CT Images.椎体压缩骨折与骨密度:CT图像上的自动检测与分类
Radiology. 2017 Sep;284(3):788-797. doi: 10.1148/radiol.2017162100. Epub 2017 Mar 16.

引用本文的文献

1
Deep Learning Radiomics Model Based on Computed Tomography Image for Predicting the Classification of Osteoporotic Vertebral Fractures: Algorithm Development and Validation.基于计算机断层扫描图像的深度学习放射组学模型用于预测骨质疏松性椎体骨折的分类:算法开发与验证
JMIR Med Inform. 2025 Aug 29;13:e75665. doi: 10.2196/75665.
2
Publicly Available Datasets for Artificial Intelligence in Neurosurgery: A Systematic Review.神经外科人工智能的公开可用数据集:一项系统综述。
J Clin Med. 2025 Aug 11;14(16):5674. doi: 10.3390/jcm14165674.
3
Incorporating Artificial Intelligence into Fracture Risk Assessment: Using Clinical Imaging to Predict the Unpredictable.将人工智能纳入骨折风险评估:利用临床影像预测不可预测之事。
Endocrinol Metab (Seoul). 2025 Aug;40(4):499-507. doi: 10.3803/EnM.2025.2518. Epub 2025 Aug 4.
4
Artificial intelligence in orthopedics: fundamentals, current applications, and future perspectives.骨科中的人工智能:基础、当前应用及未来展望。
Mil Med Res. 2025 Aug 4;12(1):42. doi: 10.1186/s40779-025-00633-z.
5
The effectiveness of Valsalva Maneuver-Assisted percutaneous vertebroplasty in reducing cement leakage in osteoporotic vertebral compression fractures.瓦尔萨尔瓦动作辅助经皮椎体成形术在减少骨质疏松性椎体压缩骨折中骨水泥渗漏方面的有效性。
BMC Musculoskelet Disord. 2025 Jul 16;26(1):688. doi: 10.1186/s12891-025-08840-4.
6
Developing a Deep Learning Radiomics Model Combining Lumbar CT, Multi-Sequence MRI, and Clinical Data to Predict High-Risk Adjacent Segment Degeneration Following Lumbar Fusion: A Retrospective Multicenter Study.开发一种结合腰椎CT、多序列MRI和临床数据的深度学习放射组学模型,以预测腰椎融合术后的高风险相邻节段退变:一项回顾性多中心研究。
Global Spine J. 2025 Jun 9:21925682251342531. doi: 10.1177/21925682251342531.
7
UANV: UNet-based attention network for thoracolumbar vertebral compression fracture angle measurement.UANV:基于U-Net的注意力网络用于胸腰椎椎体压缩骨折角度测量。
Sci Rep. 2025 Jun 6;15(1):19952. doi: 10.1038/s41598-025-03514-6.
8
Artificial Intelligence in the Diagnosis and Prognostication of the Musculoskeletal Patient.人工智能在肌肉骨骼疾病患者诊断与预后评估中的应用
HSS J. 2025 May 28:15563316251339660. doi: 10.1177/15563316251339660.
9
Improved radiological diagnosis of osteoporotic vertebral fragility fractures following UK-wide interventions and re-audit-can this be maintained and translated into clinical practice?在全英国范围的干预措施及重新审核之后,骨质疏松性椎体脆性骨折的放射学诊断得到改善——这种情况能否得以维持并转化为临床实践?
Osteoporos Int. 2025 Jun;36(6):1069-1076. doi: 10.1007/s00198-025-07488-z. Epub 2025 Apr 22.
10
The Application of Artificial Intelligence in Spine Surgery: A Scoping Review.人工智能在脊柱外科手术中的应用:一项范围综述。
J Am Acad Orthop Surg Glob Res Rev. 2025 Apr 10;9(4). doi: 10.5435/JAAOSGlobal-D-24-00405. eCollection 2025 Apr 1.