Nigru Alemu Sisay, Benini Sergio, Bonetti Matteo, Bragaglio Graziella, Frigerio Michele, Maffezzoni Federico, Leonardi Riccardo
Department of Information Engineering, University of Brescia, via Branze 38, Brescia 25123, Italy.
Department of Clinical and Experimental Sciences, University of Brescia, Viale Europa, 11, Brescia 25123, Italy.
N Am Spine Soc J. 2024 Oct 26;20:100564. doi: 10.1016/j.xnsj.2024.100564. eCollection 2024 Dec.
In recent years, the integration of Artificial Intelligence (AI) models has revolutionized the diagnosis of Low Back Pain (LBP) and associated disc pathologies. Among these, SpineNetV2 stands out as a state-of-the-art, open-access model for detecting and grading various intervertebral disc pathologies. However, ensuring the reliability and applicability of AI models like SpineNetV2 is paramount. Rigorous validation is essential to guarantee their robustness and generalizability across diverse patient cohorts and imaging protocols.
We conducted a retrospective analysis of MRI images of 1747 lumbosacral intervertebral discs (IVDs) from 353 patients (mean age, 54 ± 15.4 years, 44.5% female) with various spinal disorders, collected between September 2021 and February 2023 at X-Ray Service s.r.l. The SpineNetV2 system was used to grade 11 distinct lumbosacral disc pathologies, including Pfirrmann grading, disc narrowing, central canal stenosis, spondylolisthesis, (upper and lower) endplate defects, (upper and lower) marrow changes, (right and left) foraminal stenosis, and disc herniation, using T2-weighted sagittal MR images. Performance metrics included accuracy, balanced accuracy, precision, F1 score, Matthew's Correlation Coefficient, Brier Score Loss, Lin's concordance correlation coefficients, and Cohen's kappa coefficients. Two expert radiologists provide annotations for these discs. The evaluation of SpineNetV2's grading is compared against expert radiologists' assessments.
SpineNetV2 demonstrated strong performance across various metrics, with high agreement scores (Cohen's Kappa, Lin's Concordance, and Matthew's Correlation Coefficient exceeding 0.7) for most pathologies. However, lower agreement was found for foraminal stenosis and disc herniation, underscoring the limitations of sagittal MR images for evaluating these conditions.
This study highlights the importance of external validation, emphasizing the need for comprehensive assessments of deep learning models. SpineNetV2 exhibits promising results in predicting disc pathologies, with findings guiding further improvements. The open-source release of SpineNetV2 enables researchers to independently validate and extend the model's capabilities. This collaborative approach promotes innovation and accelerates the development of more reliable and comprehensive deep learning tools for the assessment of spine pathology.
近年来,人工智能(AI)模型的整合彻底改变了腰痛(LBP)及相关椎间盘病变的诊断方式。其中,SpineNetV2作为一种用于检测和分级各种椎间盘病变的先进开放获取模型脱颖而出。然而,确保像SpineNetV2这样的AI模型的可靠性和适用性至关重要。严格的验证对于保证其在不同患者群体和成像协议中的稳健性和通用性必不可少。
我们对2021年9月至2023年2月期间在X-Ray Service s.r.l.收集的353例(平均年龄54±15.4岁,44.5%为女性)患有各种脊柱疾病患者的1747个腰骶椎间盘(IVD)的MRI图像进行了回顾性分析。使用SpineNetV2系统,通过T2加权矢状面MR图像对11种不同的腰骶椎间盘病变进行分级,包括Pfirrmann分级、椎间盘狭窄、中央椎管狭窄、椎体滑脱、(上和下)终板缺损、(上和下)骨髓改变、(右和左)椎间孔狭窄以及椎间盘突出。性能指标包括准确率、平衡准确率、精确率、F1分数、马修斯相关系数、布里尔分数损失、林氏一致性相关系数和科恩kappa系数。两名专家放射科医生对这些椎间盘进行标注。将SpineNetV2的分级评估与专家放射科医生的评估进行比较。
SpineNetV2在各项指标上均表现出强劲性能,大多数病变的一致性得分较高(科恩kappa系数、林氏一致性系数和马修斯相关系数超过0.7)。然而,对于椎间孔狭窄和椎间盘突出,一致性较低,这突出了矢状面MR图像在评估这些情况时的局限性。
本研究强调了外部验证的重要性,强调了对深度学习模型进行全面评估的必要性。SpineNetV2在预测椎间盘病变方面展现出了有前景的结果,研究结果为进一步改进提供了指导。SpineNetV2的开源发布使研究人员能够独立验证并扩展该模型的能力。这种协作方法促进了创新,并加速了开发更可靠、更全面的用于评估脊柱病变的深度学习工具。