Cornell University, New York, USA.
Northwestern University Feinberg School of Medicine, Chicago, USA.
J Clin Hypertens (Greenwich). 2023 Dec;25(12):1135-1144. doi: 10.1111/jch.14745. Epub 2023 Nov 16.
Machine learning methods are widely used within the medical field to enhance prediction. However, little is known about the reliability and efficacy of these models to predict long-term medical outcomes such as blood pressure using lifestyle factors, such as diet. The authors assessed whether machine-learning techniques could accurately predict hypertension risk using nutritional information. A cross-sectional study using data from the National Health and Nutrition Examination Survey (NHANES) between January 2017 and March 2020. XGBoost was used as the machine-learning model of choice in this study due to its increased performance relative to other common methods within medical studies. Model prediction metrics (e.g., AUROC, Balanced Accuracy) were used to measure overall model efficacy, covariate Gain statistics (percentage each covariate contributes to the overall prediction) and SHapely Additive exPlanations (SHAP, method to visualize each covariate) were used to provide explanations to machine-learning output and increase the transparency of this otherwise cryptic method. Of a total of 9650 eligible patients, the mean age was 41.02 (SD = 22.16), 4792 (50%) males, 4858 (50%) female, 3407 (35%) White patients, 2567 (27%) Black patients, 2108 (22%) Hispanic patients, and 981 (10%) Asian patients. From evaluation of model gain statistics, age was found to be the single strongest predictor of hypertension, with a gain of 53.1%. Additionally, demographic factors such as poverty and Black race were also strong predictors of hypertension, with gain of 4.33% and 4.18%, respectively. Nutritional Covariates contributed 37% to the overall prediction: Sodium, Caffeine, Potassium, and Alcohol intake being significantly represented within the model. Machine Learning can be used to predict hypertension.
机器学习方法在医学领域被广泛用于增强预测。然而,对于这些模型使用生活方式因素(如饮食)来预测长期医疗结果(如血压)的可靠性和效果知之甚少。作者评估了机器学习技术是否可以使用营养信息准确预测高血压风险。这是一项使用 2017 年 1 月至 2020 年 3 月期间国家健康和营养检查调查(NHANES)数据的横断面研究。由于 XGBoost 在医学研究中相对于其他常见方法具有更高的性能,因此它被用作本研究中的机器学习模型选择。模型预测指标(例如 AUROC、平衡准确性)用于衡量整体模型效果,协变量增益统计数据(每个协变量对整体预测的贡献百分比)和 Shapely Additive exPlanations(SHAP,用于可视化每个协变量的方法)用于为机器学习输出提供解释,并增加该方法的透明度,因为该方法本来是隐晦的。在总共 9650 名合格患者中,平均年龄为 41.02(SD=22.16),男性 4792 人(50%),女性 4858 人(50%),白人 3407 人(35%),黑人 2567 人(27%),西班牙裔 2108 人(22%),亚裔 981 人(10%)。从模型增益统计数据的评估来看,年龄是高血压的单一最强预测因素,增益为 53.1%。此外,贫困和黑人种族等人口统计学因素也是高血压的强预测因素,增益分别为 4.33%和 4.18%。营养协变量对整体预测的贡献为 37%:钠、咖啡因、钾和酒精摄入量在模型中得到了显著体现。机器学习可用于预测高血压。