Cha Yonghan, Seo Sung Hyo, Kim Jung-Taek, Kim Jin-Woo, Lee Sang-Yeob, Yoo Jun-Il
Department of Orthopaedic Surgery, Daejeon Eulji Medical Center, Eulji University School of Medicine, Daejeon, Korea.
Department of Biomedical Research Institute, Gyeongsang National University Hospital, Jinju, Korea.
J Bone Metab. 2023 Aug;30(3):263-273. doi: 10.11005/jbm.2023.30.3.263. Epub 2023 Aug 31.
The purpose of this study was to verify the accuracy and validity of using machine learning (ML) to select risk factors, to discriminate differences in feature selection by ML between men and women, and to develop predictive models for patients with osteoporosis in a big database.
The data on 968 observed features from a total of 3,484 the Korea National Health and Nutrition Examination Survey participants were collected. To find preliminary features that were well-related to osteoporosis, logistic regression, random forest, gradient boosting, adaptive boosting, and support vector machine were used.
In osteoporosis feature selection by 5 ML models in this study, the most selected variables as risk factors in men and women were body mass index, monthly alcohol consumption, and dietary surveys. However, differences between men and women in osteoporosis feature selection by ML models were age, smoking, and blood glucose level. The receiver operating characteristic (ROC) analysis revealed that the area under the ROC curve for each ML model was not significantly different for either gender.
ML performed a feature selection of osteoporosis, considering hidden differences between men and women. The present study considers the preprocessing of input data and the feature selection process as well as the ML technique to be important factors for the accuracy of the osteoporosis prediction model.
本研究的目的是验证使用机器学习(ML)选择风险因素的准确性和有效性,辨别ML在男性和女性之间进行特征选择的差异,并在一个大型数据库中为骨质疏松症患者开发预测模型。
收集了来自韩国国家健康与营养检查调查的3484名参与者的968个观察特征的数据。为了找到与骨质疏松症密切相关的初步特征,使用了逻辑回归、随机森林、梯度提升、自适应提升和支持向量机。
在本研究中通过5种ML模型进行骨质疏松症特征选择时,男性和女性中作为风险因素被选中最多的变量是体重指数、每月饮酒量和饮食调查。然而,ML模型在骨质疏松症特征选择上男性和女性之间的差异在于年龄、吸烟和血糖水平。受试者工作特征(ROC)分析显示,每个ML模型的ROC曲线下面积在两种性别中均无显著差异。
ML在考虑男性和女性之间潜在差异的情况下对骨质疏松症进行了特征选择。本研究认为输入数据的预处理、特征选择过程以及ML技术是骨质疏松症预测模型准确性的重要因素。