Nevada Institute of Personalized Medicine, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV, 89154-4009, USA.
Department of Epidemiology and Biostatistics, School of Public Health, University of Nevada, Las Vegas, NV, USA.
Sci Rep. 2021 Feb 24;11(1):4482. doi: 10.1038/s41598-021-83828-3.
The study aimed to utilize machine learning (ML) approaches and genomic data to develop a prediction model for bone mineral density (BMD) and identify the best modeling approach for BMD prediction. The genomic and phenotypic data of Osteoporotic Fractures in Men Study (n = 5130) was analyzed. Genetic risk score (GRS) was calculated from 1103 associated SNPs for each participant after a comprehensive genotype imputation. Data were normalized and divided into a training set (80%) and a validation set (20%) for analysis. Random forest, gradient boosting, neural network, and linear regression were used to develop BMD prediction models separately. Ten-fold cross-validation was used for hyper-parameters optimization. Mean square error and mean absolute error were used to assess model performance. When using GRS and phenotypic covariates as the predictors, all ML models' performance and linear regression in BMD prediction were similar. However, when replacing GRS with the 1103 individual SNPs in the model, ML models performed significantly better than linear regression (with lasso regularization), and the gradient boosting model performed the best. Our study suggested that ML models, especially gradient boosting, can improve BMD prediction in genomic data.
本研究旨在利用机器学习 (ML) 方法和基因组数据开发骨密度 (BMD) 预测模型,并确定 BMD 预测的最佳建模方法。对男性骨质疏松性骨折研究(n=5130)的基因组和表型数据进行了分析。对每个参与者,根据 1103 个相关 SNP 计算遗传风险评分(GRS)。对数据进行标准化处理,并将其分为训练集(80%)和验证集(20%)进行分析。分别使用随机森林、梯度提升、神经网络和线性回归来开发 BMD 预测模型。使用十折交叉验证进行超参数优化。使用均方误差和平均绝对误差评估模型性能。当使用 GRS 和表型协变量作为预测因子时,所有 ML 模型在 BMD 预测中的性能和线性回归相似。然而,当在模型中用 1103 个个体 SNP 替换 GRS 时,ML 模型的性能明显优于线性回归(具有lasso 正则化),其中梯度提升模型的表现最佳。本研究表明,ML 模型,特别是梯度提升,可改善基因组数据中的 BMD 预测。