Jinan Engineering Polytechnic, Ji-Nan, Shandong, China.
College of Intelligent Equipment, Shandong University of Science & Technology, Tai-an, Shandong, China.
Nutr Diabetes. 2024 Aug 14;14(1):63. doi: 10.1038/s41387-024-00324-z.
Diabetes, as a significant disease affecting public health, requires early detection for effective management and intervention. However, imbalanced datasets pose a challenge to accurate diabetes prediction. This imbalance often results in models performing poorly in predicting minority classes, affecting overall diagnostic performance.
To address this issue, this study employs a combination of Synthetic Minority Over-sampling Technique (SMOTE) and Random Under-Sampling (RUS) for data balancing and uses Optuna for hyperparameter optimization of machine learning models. This approach aims to fill the gap in current research concerning data balancing and model optimization, thereby improving prediction accuracy and computational efficiency.
First, the study uses SMOTE and RUS methods to process the imbalanced diabetes dataset, balancing the data distribution. Then, Optuna is utilized to optimize the hyperparameters of the LightGBM model to enhance its performance. During the experiment, the effectiveness of the proposed methods is evaluated by comparing the training results of the dataset before and after balancing.
The experimental results show that the enhanced LightGBM-Optuna model improves the accuracy from 97.07% to 97.11%, and the precision from 97.17% to 98.99%. The time required for a single search is only 2.5 seconds. These results demonstrate the superiority of the proposed method in handling imbalanced datasets and optimizing model performance.
The study indicates that combining SMOTE and RUS data balancing algorithms with Optuna for hyperparameter optimization can effectively enhance machine learning models, especially in dealing with imbalanced datasets for diabetes prediction.
糖尿病是一种严重影响公众健康的疾病,需要早期发现以便进行有效管理和干预。然而,不平衡数据集对准确的糖尿病预测构成了挑战。这种不平衡通常导致模型在预测少数类时表现不佳,影响整体诊断性能。
为了解决这个问题,本研究结合使用 Synthetic Minority Over-sampling Technique (SMOTE) 和 Random Under-Sampling (RUS) 进行数据平衡,并使用 Optuna 进行机器学习模型的超参数优化。这种方法旨在填补当前数据平衡和模型优化研究中的空白,从而提高预测准确性和计算效率。
首先,研究使用 SMOTE 和 RUS 方法处理不平衡的糖尿病数据集,平衡数据分布。然后,使用 Optuna 优化 LightGBM 模型的超参数,以提高其性能。在实验中,通过比较数据集在平衡前后的训练结果来评估所提出方法的有效性。
实验结果表明,增强的 LightGBM-Optuna 模型将准确性从 97.07%提高到 97.11%,精度从 97.17%提高到 98.99%。单次搜索所需的时间仅为 2.5 秒。这些结果表明,所提出的方法在处理不平衡数据集和优化模型性能方面具有优越性。
研究表明,结合 SMOTE 和 RUS 数据平衡算法以及 Optuna 进行超参数优化可以有效地增强机器学习模型,特别是在处理糖尿病预测中的不平衡数据集方面。