Wu Qing, Dai Jingyuan
Department of Biomedical Informatics (Dr. Qing Wu, Jingyuan Dai), College of Medicine, The Ohio State University, Columbus, OH 43210, United States.
J Bone Miner Res. 2024 May 2;39(4):462-472. doi: 10.1093/jbmr/zjae025.
This study aimed to enhance the fracture risk prediction accuracy in major osteoporotic fractures (MOFs) and hip fractures (HFs) by integrating genetic profiles, machine learning (ML) techniques, and Bayesian optimization. The genetic risk score (GRS), derived from 1,103 risk single nucleotide polymorphisms (SNPs) from genome-wide association studies (GWAS), was formulated for 25,772 postmenopausal women from the Women's Health Initiative dataset. We developed four ML models: Support Vector Machine (SVM), Random Forest, XGBoost, and Artificial Neural Network (ANN) for binary fracture outcome and 10-year fracture risk prediction. GRS and FRAX clinical risk factors (CRFs) were used as predictors. Death as a competing risk was accounted for in ML models for time-to-fracture data. ML models were subsequently fine-tuned through Bayesian optimization, which displayed marked superiority over traditional grid search. Evaluation of the models' performance considered an array of metrics such as accuracy, weighted F1 Score, the area under the precision-recall curve (PRAUC), and the area under the receiver operating characteristic curve (AUC) for binary fracture predictions, and the C-index, Brier score, and dynamic mean AUC over a 10-year follow-up period for fracture risk predictions. We found that GRS-integrated XGBoost with Bayesian optimization is the most effective model, with an accuracy of 91.2% (95% CI: 90.4-92.0%) and an AUC of 0.739 (95% CI: 0.731-0.746) in MOF binary predictions. For 10-year fracture risk modeling, the XGBoost model attained a C-index of 0.795 (95% CI: 0.783-0.806) and a mean dynamic AUC of 0.799 (95% CI: 0.788-0.809). Compared to FRAX, the XGBoost model exhibited a categorical net reclassification improvement (NRI) of 22.6% (P = .004). A sensitivity analysis, which included BMD but lacked GRS, reaffirmed these findings. Furthermore, portability tests in diverse non-European groups, including Asians and African Americans, underscored the model's robustness and adaptability. This study accentuates the potential of combining genetic insights and optimized ML in strengthening fracture predictions, heralding new preventive strategies for postmenopausal women.
本研究旨在通过整合基因谱、机器学习(ML)技术和贝叶斯优化,提高主要骨质疏松性骨折(MOF)和髋部骨折(HF)的骨折风险预测准确性。从全基因组关联研究(GWAS)的1103个风险单核苷酸多态性(SNP)中得出的遗传风险评分(GRS),应用于来自女性健康倡议数据集的25772名绝经后女性。我们开发了四种ML模型:支持向量机(SVM)、随机森林、XGBoost和人工神经网络(ANN),用于二元骨折结局和10年骨折风险预测。GRS和FRAX临床风险因素(CRF)用作预测因子。在骨折发生时间数据的ML模型中,将死亡作为竞争风险进行考虑。随后通过贝叶斯优化对ML模型进行微调,其表现出比传统网格搜索明显的优越性。对模型性能的评估考虑了一系列指标,如用于二元骨折预测的准确性、加权F1分数、精确召回曲线下面积(PRAUC)和受试者工作特征曲线下面积(AUC),以及用于骨折风险预测的10年随访期内的C指数、Brier分数和动态平均AUC。我们发现,结合贝叶斯优化的GRS整合XGBoost是最有效的模型,在MOF二元预测中的准确率为91.2%(95%CI:90.4-92.0%),AUC为0.739(95%CI:0.731-0.746)。对于10年骨折风险建模,XGBoost模型的C指数为0.795(95%CI:0.783-0.806),平均动态AUC为0.799(95%CI:0.788-0.809)。与FRAX相比,XGBoost模型的分类净重新分类改善(NRI)为22.6%(P = 0.004)。一项包含骨密度但缺乏GRS的敏感性分析再次证实了这些发现。此外,在包括亚洲人和非裔美国人在内的不同非欧洲群体中的可移植性测试,强调了该模型的稳健性和适应性。本研究强调了结合基因见解和优化的ML在强化骨折预测方面的潜力,为绝经后女性带来了新的预防策略。