Yayli Sahika Betul, Kılıç Kutay, Beyaz Salih
Artificial Intelligence and Digital Analytics Solutions, Turkcell Technology, Istanbul, Türkiye.
Orthopedics and Traumatology Department, Adana Turgut Noyan Research and Training Centre, Baskent University, Adana, Türkiye.
Front Artif Intell. 2025 Feb 5;8:1413820. doi: 10.3389/frai.2025.1413820. eCollection 2025.
This study aims to classify Kellgren-Lawrence (KL) osteoarthritis stages using knee anteroposterior X-ray images by comparing two deep learning (DL) methodologies: a traditional single-model approach and a proposed multi-model approach. We addressed three core research questions in this study: (1) How effective are single-model and multi-model deep learning approaches in classifying KL stages? (2) How do seven convolutional neural network (CNN) architectures perform across four distinct deep learning tasks? (3) What is the impact of CLAHE (Contrast Limited Adaptive Histogram Equalization) on classification performance?
We created a dataset of 14,607 annotated knee AP X-rays from three hospitals. The knee joint region was isolated using a YOLOv5 object detection model. The multi-model approach utilized three DL models: one for osteophyte detection, another for joint space narrowing analysis, and a third to combine these outputs with demographic and image data for KL classification. The single-model approach directly classified KL stages as a benchmark. Seven CNN architectures (NfNet-F0/F1, EfficientNet-B0/B3, Inception-ResNet-v2, VGG16) were trained with and without CLAHE augmentation.
The single-model approach achieved an F1-score of 0.763 and accuracy of 0.767, outperforming the multi-model strategy, which scored 0.736 and 0.740. Different models performed best across tasks, underscoring the need for task-specific architecture selection. CLAHE negatively impacted most models, with only one showing a marginal improvement of 0.3%.
The single-model approach was more effective for KL grading, surpassing metrics in existing literature. These findings emphasize the importance of task-specific architectures and preprocessing. Future studies should explore ensemble modeling, advanced augmentations, and clinical validation to enhance applicability.
本研究旨在通过比较两种深度学习(DL)方法:传统的单模型方法和提出的多模型方法,利用膝关节前后位X线图像对凯尔格伦-劳伦斯(KL)骨关节炎阶段进行分类。我们在本研究中解决了三个核心研究问题:(1)单模型和多模型深度学习方法在KL阶段分类中的效果如何?(2)七种卷积神经网络(CNN)架构在四个不同的深度学习任务中的表现如何?(3)对比度受限自适应直方图均衡化(CLAHE)对分类性能有何影响?
我们创建了一个来自三家医院的14607张标注膝关节前后位X线片的数据集。使用YOLOv5目标检测模型分离膝关节区域。多模型方法利用三个DL模型:一个用于骨赘检测,另一个用于关节间隙变窄分析,第三个用于将这些输出与人口统计学和图像数据相结合以进行KL分类。单模型方法直接将KL阶段分类作为基准。七种CNN架构(NfNet-F0/F1、EfficientNet-B0/B3、Inception-ResNet-v2、VGG16)在有和没有CLAHE增强的情况下进行训练。
单模型方法的F1分数为0.763,准确率为0.767,优于多模型策略,多模型策略的分数分别为0.736和0.740。不同模型在不同任务中表现最佳,这突出了选择特定任务架构的必要性。CLAHE对大多数模型有负面影响,只有一个模型有0.3%的轻微改善。
单模型方法在KL分级中更有效,超过了现有文献中的指标。这些发现强调了特定任务架构和预处理的重要性。未来的研究应探索集成建模、先进的增强方法和临床验证,以提高适用性。