Wang Tianyi, Chen Ruiyuan, Fan Ning, Zang Lei, Yuan Shuo, Du Peng, Wu Qichao, Wang Aobo, Li Jian, Kong Xiaochuan, Zhu Wenyi
Beijing Chaoyang Hospital, Capital Medical University, Beijing, China.
J Med Internet Res. 2024 Dec 23;26:e54676. doi: 10.2196/54676.
Lumbar spinal stenosis (LSS) is a major cause of pain and disability in older individuals worldwide. Although increasing studies of traditional machine learning (TML) and deep learning (DL) were conducted in the field of diagnosing LSS and gained prominent results, the performance of these models has not been analyzed systematically.
This systematic review and meta-analysis aimed to pool the results and evaluate the heterogeneity of the current studies in using TML or DL models to diagnose LSS, thereby providing more comprehensive information for further clinical application.
This review was performed under the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines using articles extracted from PubMed, Embase databases, and Cochrane Library databases. Studies that evaluated DL or TML algorithms assessment value on diagnosing LSS were included, while those with duplicated or unavailable data were excluded. Quality Assessment of Diagnostic Accuracy Studies 2 was used to estimate the risk of bias in each study. The MIDAS module and the METAPROP module of Stata (StataCorp) were used for data synthesis and statistical analyses.
A total of 12 studies with 15,044 patients reported the assessment value of TML or DL models for diagnosing LSS. The risk of bias assessment yielded 4 studies with high risk of bias, 3 with unclear risk of bias, and 5 with completely low risk of bias. The pooled sensitivity and specificity were 0.84 (95% CI: 0.82-0.86; I=99.06%) and 0.87 (95% CI 0.84-0.90; I=98.7%), respectively. The diagnostic odds ratio was 36 (95% CI 26-49), the positive likelihood ratio (LR+) was 6.6 (95% CI 5.1-8.4), and the negative likelihood ratio (LR-) was 0.18 (95% CI 0.16-0.21). The summary receiver operating characteristic curves, the area under the curve of TML or DL models for diagnosing LSS of 0.92 (95% CI 0.89-0.94), indicating a high diagnostic value.
This systematic review and meta-analysis emphasize that despite the generally satisfactory diagnostic performance of artificial intelligence systems in the experimental stage for the diagnosis of LSS, none of them is reliable and practical enough to apply in real clinical practice. Further efforts, including optimization of model balance, widely accepted objective reference standards, multimodal strategy, large dataset for training and testing, external validation, and sufficient and scientific report, should be made to bridge the distance between current TML or DL models and real-life clinical applications in future studies.
PROSPERO CRD42024566535; https://tinyurl.com/msx59x8k.
腰椎管狭窄症(LSS)是全球老年个体疼痛和残疾的主要原因。尽管在LSS诊断领域对传统机器学习(TML)和深度学习(DL)的研究日益增多并取得了显著成果,但这些模型的性能尚未得到系统分析。
本系统评价和荟萃分析旨在汇总结果并评估当前使用TML或DL模型诊断LSS的研究的异质性,从而为进一步的临床应用提供更全面的信息。
本评价按照PRISMA(系统评价和荟萃分析的首选报告项目)指南进行,使用从PubMed、Embase数据库和Cochrane图书馆数据库中提取的文章。纳入评估DL或TML算法对LSS诊断评估价值的研究,排除数据重复或不可用的研究。使用诊断准确性研究的质量评估2来估计每项研究中的偏倚风险。使用Stata(StataCorp)的MIDAS模块和METAPROP模块进行数据合成和统计分析。
共有12项研究,涉及15044例患者,报告了TML或DL模型对LSS诊断的评估价值。偏倚风险评估得出4项研究具有高偏倚风险,3项研究偏倚风险不明确,5项研究偏倚风险完全低。汇总的敏感性和特异性分别为0.84(95%CI:0.82 - 0.86;I = 99.06%)和0.87(95%CI 0.84 - 0.90;I = 98.7%)。诊断比值比为36(95%CI 26 - 49),阳性似然比(LR+)为6.6(95%CI 5.1 - 8.4),阴性似然比(LR-)为0.18(95%CI 0.16 - 0.21)。TML或DL模型诊断LSS的汇总受试者工作特征曲线下面积为0.92(95%CI 0.89 - 0.94),表明诊断价值高。
本系统评价和荟萃分析强调,尽管人工智能系统在LSS诊断的实验阶段诊断性能总体令人满意,但它们都不够可靠和实用,无法应用于实际临床实践。未来研究应进一步努力,包括优化模型平衡、广泛接受的客观参考标准、多模态策略、用于训练和测试的大型数据集、外部验证以及充分和科学的报告,以弥合当前TML或DL模型与实际临床应用之间的差距。
PROSPERO CRD42024566535;https://tinyurl.com/msx59x8k。