Suppr超能文献

利用环境挥发性有机化合物暴露识别美国人群心血管疾病风险:基于 SHAP 方法的机器学习预测模型。

Identifying cardiovascular disease risk in the U.S. population using environmental volatile organic compounds exposure: A machine learning predictive model based on the SHAP methodology.

机构信息

Cardiovascular medicine department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China.

Department of Neurosurgery, The Second Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, Jiangxi 330006, China.

出版信息

Ecotoxicol Environ Saf. 2024 Nov 1;286:117210. doi: 10.1016/j.ecoenv.2024.117210. Epub 2024 Oct 23.

Abstract

BACKGROUND

Cardiovascular disease (CVD) remains a leading cause of mortality globally. Environmental pollutants, specifically volatile organic compounds (VOCs), have been identified as significant risk factors. This study aims to develop a machine learning (ML) model to predict CVD risk based on VOC exposure and demographic data using SHapley Additive exPlanations (SHAP) for interpretability.

METHODS

We utilized data from the National Health and Nutrition Examination Survey (NHANES) from 2011 to 2018, comprising 5098 participants. VOC exposure was assessed through 15 urinary metabolite metrics. The dataset was split into a training set (70 %) and a test set (30 %). Six ML models were developed, including Random Forest (RF), Light Gradient Boosting Machine (LightGBM), Decision Tree (DT), Extreme Gradient Boosting (XGBoost), Multi-Layer Perceptron (MLP), and Support Vector Machines (SVM). Model performance was evaluated using the Area Under the Receiver Operating Characteristic Curve (AUROC), accuracy, balanced accuracy, F1 score, J-index, kappa, Matthew's correlation coefficient (MCC), positive predictive value (PPV), negative predictive value (NPV), sensitivity (sens), specificity (spec) and SHAP was applied to interpret the best-performing model.

RESULTS

The RF model exhibited the highest predictive performance with an ROC of 0.8143. SHAP analysis identified age and ATCA as the most significant predictors, with ATCA showing a protective effect against CVD, particularly in older adults and those with hypertension. The study found a significant interaction between ATCA levels and age, indicating that the protective effect of ATCA is more pronounced in older individuals due to increased oxidative stress and inflammatory responses associated with aging. E-values analysis suggested robustness to unmeasured confounders.

CONCLUSIONS

This study is the first to utilize VOC exposure data to construct an ML model for predicting CVD risk. The findings highlight the potential of combining environmental exposure data with demographic information to enhance CVD risk prediction, supporting the development of personalized prevention and intervention strategies.

摘要

背景

心血管疾病(CVD)仍然是全球主要的死亡原因。环境污染物,特别是挥发性有机化合物(VOCs),已被确定为重要的危险因素。本研究旨在开发一种机器学习(ML)模型,通过 SHapley Additive exPlanations(SHAP)进行可解释性分析,根据 VOC 暴露和人口统计学数据预测 CVD 风险。

方法

我们利用了 2011 年至 2018 年国家健康和营养检查调查(NHANES)的数据,包括 5098 名参与者。通过 15 种尿代谢物指标评估 VOC 暴露。数据集分为训练集(70%)和测试集(30%)。开发了六种 ML 模型,包括随机森林(RF)、轻梯度提升机(LightGBM)、决策树(DT)、极端梯度提升机(XGBoost)、多层感知机(MLP)和支持向量机(SVM)。使用接收器操作特征曲线下面积(AUROC)、准确性、平衡准确性、F1 分数、J 指数、kappa、马修相关系数(MCC)、阳性预测值(PPV)、阴性预测值(NPV)、灵敏度(sens)、特异性(spec)和 SHAP 评估模型性能,应用 SHAP 分析来解释性能最佳的模型。

结果

RF 模型表现出最高的预测性能,ROC 为 0.8143。SHAP 分析确定年龄和 ATCA 是最重要的预测因子,ATCA 对 CVD 具有保护作用,特别是在老年人和高血压患者中。研究发现 ATCA 水平与年龄之间存在显著的交互作用,表明由于与衰老相关的氧化应激和炎症反应增加,ATCA 的保护作用在老年人中更为明显。E 值分析表明对未测量的混杂因素具有稳健性。

结论

这是首次利用 VOC 暴露数据构建用于预测 CVD 风险的 ML 模型的研究。研究结果强调了将环境暴露数据与人口统计学信息相结合以增强 CVD 风险预测的潜力,支持开发个性化的预防和干预策略。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验