School of Population Health, University of Auckland, Auckland 1023, New Zealand.
Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02115, USA.
J Clin Endocrinol Metab. 2022 Sep 28;107(10):2737-2747. doi: 10.1210/clinem/dgac432.
Conventional prediction models for vitamin D deficiency have limited accuracy.
Using cross-sectional data, we developed models based on machine learning (ML) and compared their performance with those based on a conventional approach.
Participants were 5106 community-resident adults (50-84 years; 58% male). In the randomly sampled training set (65%), we constructed 5 ML models: lasso regression, elastic net regression, random forest, gradient boosted decision tree, and dense neural network. The reference model was a logistic regression model. Outcomes were deseasonalized serum 25-hydroxyvitamin D (25(OH)D) <50 nmol/L (yes/no) and <25 nmol/L (yes/no). In the test set (the remaining 35%), we evaluated predictive performance of each model, including area under the receiver operating characteristic curve (AUC) and net benefit (decision curves).
Overall, 1270 (25%) and 91 (2%) had 25(OH)D <50 and <25 nmol/L, respectively. Compared with the reference model, the ML models predicted 25(OH)D <50 nmol/L with similar accuracy. However, for prediction of 25(OH)D <25 nmol/L, all ML models had higher AUC point estimates than the reference model by up to 0.14. AUC was highest for elastic net regression (0.93; 95% CI 0.90-0.96), compared with 0.81 (95% CI 0.71-0.91) for the reference model. In the decision curve analysis, ML models mostly achieved a greater net benefit across a range of thresholds.
Compared with conventional models, ML models predicted 25(OH)D <50 nmol/L with similar accuracy but they predicted 25(OH)D <25 nmol/L with greater accuracy. The latter finding suggests a role for ML models in participant selection for vitamin D supplement trials.
使用横断面数据,我们基于机器学习(ML)建立了模型,并将其性能与传统方法进行了比较。
参与者为 5106 名社区居住的成年人(50-84 岁;58%为男性)。在随机抽样的训练集中(65%),我们构建了 5 个 ML 模型:lasso 回归、弹性网络回归、随机森林、梯度提升决策树和密集神经网络。参考模型为逻辑回归模型。结局为去季节性血清 25-羟维生素 D(25(OH)D)<50 nmol/L(是/否)和<25 nmol/L(是/否)。在测试集中(其余 35%),我们评估了每个模型的预测性能,包括接受者操作特征曲线下面积(AUC)和净收益(决策曲线)。
总体而言,1270 人(25%)和 91 人(2%)的 25(OH)D<50 nmol/L 和<25 nmol/L。与参考模型相比,ML 模型预测 25(OH)D<50 nmol/L 的准确性相似。然而,对于 25(OH)D<25 nmol/L 的预测,所有 ML 模型的 AUC 点估计值均高于参考模型,最高可达 0.14。弹性网络回归的 AUC 最高(0.93;95%CI 0.90-0.96),而参考模型的 AUC 为 0.81(95%CI 0.71-0.91)。在决策曲线分析中,ML 模型在一系列阈值下大多实现了更大的净收益。
与传统模型相比,ML 模型预测 25(OH)D<50 nmol/L 的准确性相似,但预测 25(OH)D<25 nmol/L 的准确性更高。这一发现表明 ML 模型在维生素 D 补充试验的参与者选择中可能具有一定作用。