Department of Radiology, University of Washington, 1959 NE Pacific Street, Box 357115, Seattle, WA, 98195-7115, USA.
Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA.
Arch Osteoporos. 2024 Sep 10;19(1):87. doi: 10.1007/s11657-024-01433-z.
Automated screening for vertebral fractures could improve outcomes. We achieved an AUC-ROC = 0.968 for the prediction of moderate to severe fracture using a GAM with age and three maximal vertebral body scores of fracture from a convolutional neural network. Maximal fracture scores resulted in a performant model for subject-level fracture prediction. Combining individual deep learning vertebral body fracture scores and demographic covariates for subject-level classification of osteoporotic fracture achieved excellent performance (AUC-ROC of 0.968) on a large dataset of radiographs with basic demographic data.
Osteoporotic vertebral fractures are common and morbid. Automated opportunistic screening for incidental vertebral fractures from radiographs, the highest volume imaging modality, could improve osteoporosis detection and management. We consider how to form patient-level fracture predictions and summarization to guide management, using our previously developed vertebral fracture classifier on segmented radiographs from a prospective cohort study of US men (MrOS). We compare the performance of logistic regression (LR) and generalized additive models (GAM) with combinations of individual vertebral scores and basic demographic covariates.
Subject-level LR and GAM models were created retrospectively using all fracture predictions or summary variables such as order statistics, adjacent vertebral interactions, and demographic covariates (age, race/ethnicity). The classifier outputs for 8663 vertebrae from 1176 thoracic and lumbar radiographs in 669 subjects were divided by subject to perform stratified fivefold cross-validation. Models were assessed using multiple metrics, including receiver operating characteristic (ROC) and precision-recall (PR) curves.
The best model (AUC-ROC = 0.968) was a GAM using the top three maximum vertebral fracture scores and age. Using top-ranked scores only, rather than all vertebral scores, improved performance for both model classes. Adding age, but not ethnicity, to the GAMs improved performance slightly.
Maximal vertebral fracture scores resulted in the highest-performing models. While combining multiple vertebral body predictions risks decreasing specificity, our results demonstrate that subject-level models maintain good predictive performance. Thresholding strategies can be used to control sensitivity and specificity as clinically appropriate.
自动化的椎体骨折筛查可以改善预后。我们使用基于卷积神经网络的广义可加模型(GAM),根据年龄和三个最大椎体骨折评分,对中重度骨折的预测实现了 AUC-ROC=0.968。最大骨折评分对于预测个体的骨折具有良好的效果。在一个具有基本人口统计学数据的大型 X 光片数据集上,结合个体深度学习椎体骨折评分和人口统计学协变量对骨质疏松性骨折进行个体分类,实现了极好的性能(AUC-ROC 为 0.968)。
骨质疏松性椎体骨折很常见且具有较高的发病率。从放射片中自动发现偶发性的椎体骨折,这是一种最常用的影像学方式,可能会提高骨质疏松症的检出率和管理水平。我们考虑如何使用之前开发的基于前瞻性美国男性(MrOS)队列研究中分割的 X 光片的椎体骨折分类器来形成个体的骨折预测和总结,以指导管理。我们比较了逻辑回归(LR)和广义加性模型(GAM)的性能,以及个体椎体评分和基本人口统计学协变量的组合。
使用所有骨折预测或总结变量(如序贯统计、相邻椎体相互作用和人口统计学协变量),通过回顾性方法在受试者水平上创建 LR 和 GAM 模型。将 8663 个椎体的分类器输出分为 669 个受试者的 1176 个胸腰椎 X 光片,以进行分层五折交叉验证。使用多种指标评估模型,包括接收者操作特征(ROC)和精准召回(PR)曲线。
表现最佳的模型(AUC-ROC=0.968)是一个使用前三个最大椎体骨折评分和年龄的 GAM。仅使用排名最高的评分,而不是所有椎体评分,可提高两种模型的性能。在 GAMs 中添加年龄而不是种族,略微提高了性能。
最大椎体骨折评分产生了性能最高的模型。虽然结合多个椎体预测的风险是特异性降低,但我们的结果表明,个体水平的模型保持了良好的预测性能。阈值策略可以根据临床需要用于控制敏感性和特异性。