Department of Food and Nutrition, Inha University, Incheon, 22212, Republic of Korea.
BMC Med Genomics. 2024 Sep 4;17(1):224. doi: 10.1186/s12920-024-01998-1.
Metabolic syndrome is a chronic disease associated with multiple comorbidities. Over the last few years, machine learning techniques have been used to predict metabolic syndrome. However, studies incorporating demographic, clinical, laboratory, dietary, and genetic factors to predict the incidence of metabolic syndrome in Koreans are limited. In the present study, we propose a genome-wide polygenic risk score for the prediction of metabolic syndrome, along with other factors, to improve the prediction accuracy of metabolic syndrome.
We developed 7 machine learning-based models and used Cox multivariable regression, deep neural network (DNN), support vector machine (SVM), stochastic gradient descent (SGD), random forest (RAF), Naïve Bayes (NBA) classifier, and AdaBoost (ADB) to predict the incidence of metabolic syndrome at year 14 using the dataset from the Korean Genome and Epidemiology Study (KoGES) Ansan and Ansung.
Of the 5440 patients, 2,120 were considered to have new-onset metabolic syndrome. The AUC values of model, which included sex, age, alcohol intake, energy intake, marital status, education status, income status, smoking status, dried laver intake, and genome-wide polygenic risk score (gPRS) Z-score based on 344,447 SNPs (p-value < 1.0), were the highest for RAF (0.994 [95% CI 0.985, 1.000]) and ADB (0.994 [95% CI 0.986, 1.000]).
Incorporating both gPRS and demographic, clinical, laboratory, and seaweed data led to enhanced metabolic syndrome risk prediction by capturing the distinct etiologies of metabolic syndrome development. The RAF- and ADB-based models predicted metabolic syndrome more accurately than the NBA-based model for the Korean population.
代谢综合征是一种与多种合并症相关的慢性疾病。在过去的几年中,机器学习技术已被用于预测代谢综合征。然而,将人口统计学、临床、实验室、饮食和遗传因素纳入其中来预测韩国人代谢综合征发生率的研究有限。在本研究中,我们提出了一种基于全基因组多基因风险评分的代谢综合征预测方法,以及其他因素,以提高代谢综合征的预测准确性。
我们开发了 7 种基于机器学习的模型,并使用 Cox 多变量回归、深度神经网络 (DNN)、支持向量机 (SVM)、随机梯度下降 (SGD)、随机森林 (RAF)、朴素贝叶斯 (NBA) 分类器和 AdaBoost (ADB) 来预测使用来自韩国基因组和流行病学研究 (KoGES) Ansan 和 Ansung 的数据集在第 14 年发生代谢综合征的情况。
在 5440 名患者中,有 2120 名被认为患有新发性代谢综合征。模型的 AUC 值,其中包括性别、年龄、饮酒量、能量摄入量、婚姻状况、教育程度、收入状况、吸烟状况、紫菜摄入量和基于 344447 个 SNPs 的全基因组多基因风险评分 (gPRS)Z 分数 (p 值 < 1.0),最高的是 RAF(0.994[95%CI 0.985,1.000])和 ADB(0.994[95%CI 0.986,1.000])。
纳入 gPRS 以及人口统计学、临床、实验室和海藻数据,可以通过捕捉代谢综合征发展的不同病因来提高代谢综合征风险预测的准确性。对于韩国人群,RAF 和 ADB 模型比 NBA 模型更准确地预测代谢综合征。