Jin Wanlin, Xu Lulu, Yue Chun, Hu Li, Wang Yuzhou, Fu Yaqian, Guo Yuanwei, Bai Fan, Yang Yanyi, Zhao Xianmei, Luo Yingquan, Wu Xiyu, Sheng Zhifeng
Health Management Center, National Clinical Research Center for Metabolic Diseases, Hunan Provincial Clinical Medicine Research Center for Intelligent Management of Chronic Disease, Hunan Provincial Key Laboratory of Metabolic Bone Diseases, Department of Metabolism and Endocrinology, The Second Xiangya Hospital of Central South University, Changsha, Hunan, China; Department of General Medicine, The Second Xiangya Hospital of Central South University, Changsha, Hunan, China.
Health Management Center, National Clinical Research Center for Metabolic Diseases, Hunan Provincial Clinical Medicine Research Center for Intelligent Management of Chronic Disease, Hunan Provincial Key Laboratory of Metabolic Bone Diseases, Department of Metabolism and Endocrinology, The Second Xiangya Hospital of Central South University, Changsha, Hunan, China.
Int J Med Inform. 2025 Jul;199:105889. doi: 10.1016/j.ijmedinf.2025.105889. Epub 2025 Mar 22.
Hip fractures are associated with reduced mobility, and higher morbidity, mortality, and healthcare costs. Approximately 90% of hip fractures in the elderly are associated with osteoporosis, making it particularly important to screen the population for hip osteoporosis and intervene early. Dual-energy X-ray absorptiometry (DXA) has limited accessibility, so predictive models for hip osteoporosis that do not use bone mineral density (BMD) data are essential. We aimed to develop and validate prediction models for female hip osteoporosis using electronic health records without BMD data.
This retrospective study used anonymized medical electronic records, from September 2013 to November 2023, from the Health Management Center of the Second Xiangya Hospital. A total of 8039 women were included in the derivation dataset. The set was then randomized into a 75% training dataset and a 25% testing dataset. Four algorithms for feature selection were used to identify predictors of osteoporosis. The identified predictors were then used to train and optimize eight machine learning models. The models were tuned using 5-fold cross-validation to assess model performance in the testing dataset and the independent validation dataset from the National Health and Nutrition Examination Surveys (NHANES). The SHapley Additive explanation (SHAP) method was used to rank feature importance and explain the final model.
A combination of the Boruta, LASSO, varSelRF, and RFE methods identified systolic blood pressure, red blood cell count, glycohemoglobin, alanine aminotransferase, aspartate aminotransferase, uric acid, age, and body mass index as the most important predictors of osteoporosis in women. The XGBoost model outperformed the other models, with an Area Under the Curve (AUC) of 0.805 (95%CI: 0.779-0.831), and a moderate sensitivity of 0.706. The externally validated XGBoost model had an AUC of 0.811 (95% CI: 0.793-0.828), with a moderate sensitivity of 0.775.
The XGBoost model demonstrates high identification performance even without questionnaire data, out-performing both the traditional the logistic regression model and the OSTA model. It can be integrated into routine clinical workflows to identify females at high risk for osteoporosis.
髋部骨折与行动能力下降、更高的发病率、死亡率及医疗成本相关。老年人中约90%的髋部骨折与骨质疏松症有关,因此对人群进行髋部骨质疏松症筛查并早期干预尤为重要。双能X线吸收法(DXA)的可及性有限,因此不使用骨密度(BMD)数据的髋部骨质疏松症预测模型至关重要。我们旨在开发并验证不使用BMD数据的女性髋部骨质疏松症预测模型,该模型利用电子健康记录。
这项回顾性研究使用了2013年9月至2023年11月中南大学湘雅二医院健康管理中心的匿名医疗电子记录。共有8039名女性纳入推导数据集。然后将该数据集随机分为75%的训练数据集和25%的测试数据集。使用四种特征选择算法来识别骨质疏松症的预测因素。然后将识别出的预测因素用于训练和优化八个机器学习模型。使用5折交叉验证对模型进行调整,以评估测试数据集以及来自美国国家健康与营养检查调查(NHANES)的独立验证数据集中的模型性能。使用SHapley加性解释(SHAP)方法对特征重要性进行排名并解释最终模型。
Boruta、LASSO、varSelRF和RFE方法相结合,确定收缩压、红细胞计数、糖化血红蛋白、丙氨酸转氨酶、天冬氨酸转氨酶、尿酸、年龄和体重指数是女性骨质疏松症最重要的预测因素。XGBoost模型的表现优于其他模型,曲线下面积(AUC)为0.805(95%CI:0.779-0.831),中等敏感度为0.706。外部验证的XGBoost模型的AUC为0.811(95%CI:0.793-0.828),中等敏感度为0.775。
即使没有问卷数据,XGBoost模型也表现出较高的识别性能,优于传统的逻辑回归模型和OSTA模型。它可以整合到常规临床工作流程中,以识别骨质疏松症高危女性。