Mizukoshi Ryo, Maruiwa Ryosuke, Ito Keitaro, Isogai Norihiro, Funao Haruki, Fujita Retsu, Yagi Mitsuru
Department of Orthopedic Surgery, School of Medicine, International University of Health and Welfare, Chiba 260-8670, Japan.
Department of Orthopedic Surgery, International University of Health and Welfare (IUHW) Narita Hospital, Chiba 268-8520, Japan.
Bioengineering (Basel). 2025 Jul 9;12(7):749. doi: 10.3390/bioengineering12070749.
Early detection of ossification of the posterior longitudinal ligament (OPLL) is hampered by the late onset of neurological symptoms, so we built and validated an interpretable machine learning model to identify OPLL during routine health examinations. We retrospectively analyzed 1442 Japanese adults screened between 2020 and 2023, including 432 imaging-confirmed cases, after median imputation, one-hot encoding, Random Forest feature selection that reduced 235 variables to 20, and class-balance correction with SMOTE. Logistic regression, Random Forest, Gradient Boosting, and XGBoost models were tuned using a 5-fold cross-validated grid search, in which a re-estimated logistic regression yielded odds ratios for clinical interpretation. The logistic model achieved 65% accuracy and an AUROC of 0.69 (95% CI 0.66-0.76), matching tree-based models, yet with fewer false-negatives. Advanced age (OR 1.60, 95% CI 1.27-2.00) and elevated CA19-9 (OR 1.24, 95% CI 1.00-1.35) independently increased OPLL odds. This concise, explainable tool could facilitate early recognition of OPLL, reduce unnecessary follow-up, and enable timely preventive interventions in high-volume screening programs.
后纵韧带骨化(OPLL)的早期检测因神经症状出现较晚而受到阻碍,因此我们构建并验证了一种可解释的机器学习模型,用于在常规健康检查中识别OPLL。我们回顾性分析了2020年至2023年间接受筛查的1442名日本成年人,其中包括432例经影像学确诊的病例,经过中位数插补、独热编码、将235个变量减少到20个的随机森林特征选择以及使用SMOTE进行类平衡校正。使用5折交叉验证网格搜索对逻辑回归、随机森林、梯度提升和XGBoost模型进行调优,其中重新估计的逻辑回归产生用于临床解释的优势比。逻辑模型的准确率达到65%,曲线下面积(AUROC)为0.69(95%置信区间0.66 - 0.76),与基于树的模型相当,但假阴性较少。高龄(优势比1.60,95%置信区间1.27 - 2.00)和CA19 - 9升高(优势比1.24,95%置信区间1.00 - 1.35)独立增加OPLL的患病几率。这个简洁、可解释的工具可以促进OPLL的早期识别,减少不必要的随访,并在大规模筛查项目中实现及时的预防性干预。