School of Public Health, Bam University of Medical Sciences, Bam, Iran.
Research Center for Food Hygiene and Safety, School of Public Health, Shahid Sadoughi University of Medical Sciences, Yazd, Iran.
BMC Bioinformatics. 2024 Jan 11;25(1):18. doi: 10.1186/s12859-024-05633-9.
Metabolic syndrome (MetS) is a cluster of metabolic abnormalities (including obesity, insulin resistance, hypertension, and dyslipidemia), which can be used to identify at-risk populations for diabetes and cardiovascular diseases, the main causes of morbidity and mortality worldwide. The achievement of a simple approach for diagnosing MetS without needing biochemical tests is so valuable. The present study aimed to predict MetS using non-invasive features based on a successful random forest learning algorithm. Also, to deal with the problem of data imbalance that naturally exists in this type of data, the effect of two different data balancing approaches, including the Synthetic Minority Over-sampling Technique (SMOTE) and Random Splitting data balancing (SplitBal), on model performance is investigated.
The most important determinant for MetS prediction was waist circumference. Applying a random forest learning algorithm to imbalanced data, the trained models reach 86.9% and 79.4% accuracies and 37.1% and 38.2% sensitivities in men and women, respectively. However, by applying the SplitBal data balancing technique, the best results were obtained, and despite that the accuracy of the trained models decreased by 7.8% and 11.3%, but their sensitivity improved significantly to 82.3% and 73.7% in men and women, respectively.
The random forest learning method, along with data balancing techniques, especially SplitBal, could create MetS prediction models with promising results that can be applied as a useful prognostic tool in health screening programs.
代谢综合征(MetS)是一组代谢异常(包括肥胖、胰岛素抵抗、高血压和血脂异常),可用于识别糖尿病和心血管疾病的高危人群,这是全球发病率和死亡率的主要原因。实现一种无需生化检测即可诊断 MetS 的简单方法非常有价值。本研究旨在使用基于成功随机森林学习算法的无创特征来预测 MetS。此外,为了解决此类数据中存在的固有数据不平衡问题,研究了两种不同的数据平衡方法,包括合成少数过采样技术(SMOTE)和随机分割数据平衡(SplitBal),对模型性能的影响。
MetS 预测的最重要决定因素是腰围。应用随机森林学习算法对不平衡数据进行处理,训练后的模型在男性和女性中的准确率分别达到 86.9%和 79.4%,灵敏度分别达到 37.1%和 38.2%。然而,通过应用 SplitBal 数据平衡技术,可以获得最佳结果,尽管训练模型的准确率分别下降了 7.8%和 11.3%,但它们的灵敏度分别显著提高到男性和女性的 82.3%和 73.7%。
随机森林学习方法结合数据平衡技术,特别是 SplitBal,可以创建具有有前途结果的 MetS 预测模型,可作为健康筛查计划中的有用预后工具。