Shahbazi Zeinab, Nowaczyk Slawomir
Center for Applied Intelligent Systems Research, Halmstad University, Sweden.
Heliyon. 2024 Dec 20;11(1):e40859. doi: 10.1016/j.heliyon.2024.e40859. eCollection 2025 Jan 15.
The influence of the exposome on major health conditions like cardiovascular disease (CVD) is widely recognized. However, integrating diverse exposome factors into predictive models for personalized health assessments remains a challenge due to the complexity and variability of environmental exposures and lifestyle factors. A machine learning (ML) model designed for predicting CVD risk is introduced in this study, relying on easily accessible exposome factors. This approach is particularly novel as it prioritizes non-clinical, modifiable exposures, making it applicable for broad public health screening and personalized risk assessments. Assessments were conducted using both internal and external validation groups from a multi-center cohort, comprising 3,237 individuals diagnosed with CVD in South Korea within twelve years of their baseline visit, along with an equal number of participants without these conditions as a control group. Examination of 109 exposome variables from participants' baseline visits spanned physical measures, environmental factors, lifestyle choices, mental health events, and early-life factors. For risk prediction, the Random Forest classifier was employed, with performance compared to an integrative ML model using clinical and physical variables. Furthermore, data preprocessing involved normalization and handling of missing values to enhance model accuracy. The model's decision-making process were using an advanced explainability method. Results indicated comparable performance between the exposome-based ML model and the integrative model, achieving AUC of 0.82(+/-)0.01, 0.70(+/-)0.01, and 0.73(+/-)0.01. The study underscores the potential of leveraging exposome data for early intervention strategies. Additionally, exposome factors significant in identifying CVD risk were pinpointed, including daytime naps, completed full-time education, past tobacco smoking, frequency of tiredness/unenthusiasm, and current work status.
暴露组对心血管疾病(CVD)等主要健康状况的影响已得到广泛认可。然而,由于环境暴露和生活方式因素的复杂性和变异性,将各种暴露组因素整合到个性化健康评估的预测模型中仍然是一项挑战。本研究引入了一种用于预测CVD风险的机器学习(ML)模型,该模型依赖于易于获取的暴露组因素。这种方法特别新颖,因为它优先考虑非临床、可改变的暴露因素,使其适用于广泛的公共卫生筛查和个性化风险评估。评估使用了来自多中心队列的内部和外部验证组,该队列包括在韩国基线访视后12年内被诊断患有CVD的3237名个体,以及同等数量没有这些疾病的参与者作为对照组。对参与者基线访视的109个暴露组变量进行了检查,涵盖身体测量、环境因素、生活方式选择、心理健康事件和早期生活因素。为了进行风险预测,采用了随机森林分类器,并将其性能与使用临床和身体变量的综合ML模型进行了比较。此外,数据预处理包括归一化和处理缺失值,以提高模型准确性。该模型的决策过程采用了先进的可解释性方法。结果表明,基于暴露组的ML模型与综合模型的性能相当,曲线下面积(AUC)分别为0.82(±0.01)、0.70(±0.01)和0.73(±0.01)。该研究强调了利用暴露组数据制定早期干预策略的潜力。此外,还确定了在识别CVD风险方面具有重要意义的暴露组因素,包括白天小睡、完成全日制教育、过去吸烟、疲劳/缺乏热情的频率以及当前工作状态。