Department of Radiology, The First Affiliated Hospital of Ningbo University, China.
Faculty of Electrical Engineering and Computer Science, Ningbo University, China.
Curr Med Imaging. 2024;20:e15734056278130. doi: 10.2174/0115734056278130231218073650.
A recently developed deep-learning-based automatic evaluation model provides reliable and efficient Cobb angle measurements for scoliosis diagnosis. However, few studies have explored its clinical application, and external validation is lacking. Therefore, this study aimed to explore the value of automated assessment models in clinical practice by comparing deep-learning models with manual measurement methods.
The 481 spine radiographs from an open-source dataset were divided into training and validation sets, and 119 spine radiographs from a private dataset were used as the test set. The mean Cobb angle values assessed by three physicians in the hospital's PACS system served as the reference standard. The results of Seg4Reg, VFLDN, and manual measurement were statistically analyzed. The intra-class correlation coefficients (ICC) and the Pearson correlation coefficient (PCC) were used to compare their reliability and correlation. The Bland-Altman method was used to compare their agreement. The Kappa statistic was used to compare the consistency of Cobb angles at different severity levels.
The mean Cobb angle values measured were 35.89° ± 9.33° with Seg4Reg, 31.54° ± 9.78° with VFLDN, and 32.23° ± 9.28° with manual measurement. The ICCs for the reliability of Seg4Reg and VFLDN were 0.809 and 0.974, respectively. The PCC and MAD between Seg4Reg and manual measurements were 0.731 (p<0.001) and 6.51°, while those between VFLDN and manual measurements were 0.952 (p<0.001) and 2.36°. The Kappa statistic indicated VFLDN (k= 0.686, p< 0.001) was superior to Seg4Reg and manual measurements for Cobb angle severity classification.
The deep-learning-based automatic scoliosis Cobb angle assessment model is feasible in clinical practice. Specifically, the keypoint-based VFLDN is more valuable in actual clinical work with higher accuracy, transparency, and interpretability.
最近开发的基于深度学习的自动评估模型为脊柱侧凸诊断提供了可靠、高效的 Cobb 角测量。然而,该模型在临床应用方面的研究较少,且缺乏外部验证。因此,本研究旨在通过比较深度学习模型与手动测量方法,探讨自动评估模型在临床实践中的价值。
使用来自开源数据集的 481 张脊柱 X 光片作为训练集和验证集,使用来自私人数据集的 119 张脊柱 X 光片作为测试集。医院 PACS 系统中三位医生评估的平均 Cobb 角值作为参考标准。对 Seg4Reg、VFLDN 和手动测量的结果进行了统计分析。采用组内相关系数(ICC)和 Pearson 相关系数(PCC)比较它们的可靠性和相关性。采用 Bland-Altman 法比较它们的一致性。采用 Kappa 统计比较不同严重程度 Cobb 角的一致性。
Seg4Reg 测量的平均 Cobb 角值为 35.89°±9.33°,VFLDN 为 31.54°±9.78°,手动测量为 32.23°±9.28°。Seg4Reg 和 VFLDN 的可靠性 ICC 分别为 0.809 和 0.974。Seg4Reg 与手动测量的 PCC 和 MAD 分别为 0.731(p<0.001)和 6.51°,VFLDN 与手动测量的 PCC 和 MAD 分别为 0.952(p<0.001)和 2.36°。Kappa 统计表明,VFLDN(k=0.686,p<0.001)在 Cobb 角严重程度分类方面优于 Seg4Reg 和手动测量。
基于深度学习的自动脊柱侧凸 Cobb 角评估模型在临床实践中是可行的。具体来说,基于关键点的 VFLDN 在实际临床工作中具有更高的准确性、透明度和可解释性,更有价值。