Si Zebing, Zhang Di, Wang Huajun, Zheng Xiaofei
Department of Sports Medicine, The First Affiliated Hospital, Guangdong Provincial Key Laboratory of Speed Capability, The Guangzhou Key Laboratory of Precision Orthopedics and Regenerative Medicine, Jinan University, Guangzhou, 510630, China.
Department of Orthopedics, Yuebei People's Hospital, 133 Shaoguan Huimin South Avenue, Shaoguan, 512026, China.
BMC Res Notes. 2025 Mar 11;18(1):108. doi: 10.1186/s13104-025-07089-3.
Osteoporosis, prevalent among the elderly population, is primarily diagnosed through bone mineral density (BMD) testing, which has limitations in early detection. This study aims to develop and validate a machine learning approach for osteoporosis identification by integrating demographic data, laboratory and questionnaire data, offering a more practical and effective screening alternative.
In this study, data from the National Health and Nutrition Examination Survey were analyzed to explore factors linked to osteoporosis. After cleaning, 8766 participants with 223 variables were studied. Minimum Redundancy Maximum Relevance and SelectKBest were employed to select the import features. Four Machine learning algorithms (RF, NN, LightGBM and XGBoost.) were applied to examine osteoporosis, with performance comparisons made. Data balancing was done using SMOTE, and metrics like F1 score, and AUC were evaluated for each algorithm.
The LightGBM model outperformed others with an F1 score of 0.914, an MCC of 0.831, and an AUC of 0.970 on the training set. On the test set, it achieved an F1 score of 0.912, an MCC of 0.826, and an AUC of 0.972. Top predictors for osteoporosis were height, age, and sex.
This study demonstrates the potential of machine learning models in assessing an individual's risk of developing osteoporosis, a condition that significantly impacts quality of life and imposes substantial healthcare costs. The superior performance of the LightGBM model suggests a promising tool for early detection and personalized prevention strategies. Importantly, identifying height, age, and sex as top predictors offers critical insights into the demographic and physiological factors that clinicians should consider when evaluating patients' risk profiles.
骨质疏松症在老年人群中普遍存在,主要通过骨密度(BMD)检测来诊断,但该检测在早期检测方面存在局限性。本研究旨在通过整合人口统计学数据、实验室数据和问卷数据,开发并验证一种用于骨质疏松症识别的机器学习方法,提供一种更实用、有效的筛查方法。
在本研究中,对来自国家健康与营养检查调查的数据进行分析,以探索与骨质疏松症相关的因素。经过清理后,对8766名参与者的223个变量进行了研究。采用最小冗余最大相关性和SelectKBest方法来选择重要特征。应用四种机器学习算法(随机森林、神经网络、LightGBM和XGBoost)来检测骨质疏松症,并进行性能比较。使用SMOTE进行数据平衡,并对每种算法评估F1分数和AUC等指标。
LightGBM模型在训练集上表现优于其他模型,F1分数为0.914,马修斯相关系数为0.831,AUC为0.970。在测试集上,它的F1分数为0.912,马修斯相关系数为0.826,AUC为0.972。骨质疏松症的主要预测因素是身高、年龄和性别。
本研究证明了机器学习模型在评估个体患骨质疏松症风险方面的潜力,骨质疏松症对生活质量有重大影响,并带来巨大的医疗成本。LightGBM模型的卓越性能表明它是一种用于早期检测和个性化预防策略的有前景的工具。重要的是,将身高、年龄和性别确定为主要预测因素,为临床医生在评估患者风险概况时应考虑的人口统计学和生理因素提供了关键见解。