Department of Epidemiology and Health Statistics, Dalian Medical University, Dalian, China.
The Health Management Center, The First Affiliated Hospital of Dalian Medical University, Dalian, Liaoning, China.
Front Public Health. 2024 Apr 25;12:1347219. doi: 10.3389/fpubh.2024.1347219. eCollection 2024.
Osteoporosis is becoming more common worldwide, imposing a substantial burden on individuals and society. The onset of osteoporosis is subtle, early detection is challenging, and population-wide screening is infeasible. Thus, there is a need to develop a method to identify those at high risk for osteoporosis.
This study aimed to develop a machine learning algorithm to effectively identify people with low bone density, using readily available demographic and blood biochemical data.
Using NHANES 2017-2020 data, participants over 50 years old with complete femoral neck BMD data were selected. This cohort was randomly divided into training (70%) and test (30%) sets. Lasso regression selected variables for inclusion in six machine learning models built on the training data: logistic regression (LR), support vector machine (SVM), gradient boosting machine (GBM), naive Bayes (NB), artificial neural network (ANN) and random forest (RF). NHANES data from the 2013-2014 cycle was used as an external validation set input into the models to verify their generalizability. Model discrimination was assessed via AUC, accuracy, sensitivity, specificity, precision and F1 score. Calibration curves evaluated goodness-of-fit. Decision curves determined clinical utility. The SHAP framework analyzed variable importance.
A total of 3,545 participants were included in the internal validation set of this study, of whom 1870 had normal bone density and 1,675 had low bone density Lasso regression selected 19 variables. In the test set, AUC was 0.785 (LR), 0.780 (SVM), 0.775 (GBM), 0.729 (NB), 0.771 (ANN), and 0.768 (RF). The LR model has the best discrimination and a better calibration curve fit, the best clinical net benefit for the decision curve, and it also reflects good predictive power in the external validation dataset The top variables in the LR model were: age, BMI, gender, creatine phosphokinase, total cholesterol and alkaline phosphatase.
The machine learning model demonstrated effective classification of low BMD using blood biomarkers. This could aid clinical decision making for osteoporosis prevention and management.
骨质疏松症在全球范围内变得越来越普遍,给个人和社会带来了巨大的负担。骨质疏松症的发病较为隐匿,早期检测具有挑战性,且无法进行人群普查。因此,需要开发一种方法来识别骨质疏松症高危人群。
本研究旨在开发一种机器学习算法,利用易得的人口统计学和血液生化数据,有效识别低骨密度人群。
使用 NHANES 2017-2020 年的数据,选择年龄超过 50 岁且股骨颈 BMD 数据完整的参与者。该队列被随机分为训练集(70%)和测试集(30%)。Lasso 回归选择变量,纳入基于训练数据构建的 6 种机器学习模型:逻辑回归(LR)、支持向量机(SVM)、梯度提升机(GBM)、朴素贝叶斯(NB)、人工神经网络(ANN)和随机森林(RF)。将 2013-2014 年 NHANES 数据作为外部验证集输入到模型中,以验证其泛化能力。通过 AUC、准确性、敏感度、特异性、精准度和 F1 评分评估模型的判别能力。校准曲线评估拟合优度。决策曲线确定临床实用性。SHAP 框架分析变量重要性。
本研究的内部验证集中共纳入 3545 名参与者,其中 1870 名参与者的骨密度正常,1675 名参与者的骨密度较低。Lasso 回归选择了 19 个变量。在测试集中,AUC 分别为 0.785(LR)、0.780(SVM)、0.775(GBM)、0.729(NB)、0.771(ANN)和 0.768(RF)。LR 模型的判别能力最佳,校准曲线拟合度更好,决策曲线的临床净获益最佳,在外部验证数据集上也表现出良好的预测能力。LR 模型中的主要变量包括:年龄、BMI、性别、肌酸磷酸激酶、总胆固醇和碱性磷酸酶。
该机器学习模型利用血液生物标志物对低 BMD 进行了有效分类,有助于骨质疏松症预防和管理的临床决策。