• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于组合平衡算法的糖尿病预测方法优化。

Optimization of diabetes prediction methods based on combinatorial balancing algorithm.

机构信息

Jinan Engineering Polytechnic, Ji-Nan, Shandong, China.

College of Intelligent Equipment, Shandong University of Science & Technology, Tai-an, Shandong, China.

出版信息

Nutr Diabetes. 2024 Aug 14;14(1):63. doi: 10.1038/s41387-024-00324-z.

DOI:10.1038/s41387-024-00324-z
PMID:39143066
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11324958/
Abstract

BACKGROUND

Diabetes, as a significant disease affecting public health, requires early detection for effective management and intervention. However, imbalanced datasets pose a challenge to accurate diabetes prediction. This imbalance often results in models performing poorly in predicting minority classes, affecting overall diagnostic performance.

OBJECTIVES

To address this issue, this study employs a combination of Synthetic Minority Over-sampling Technique (SMOTE) and Random Under-Sampling (RUS) for data balancing and uses Optuna for hyperparameter optimization of machine learning models. This approach aims to fill the gap in current research concerning data balancing and model optimization, thereby improving prediction accuracy and computational efficiency.

METHODS

First, the study uses SMOTE and RUS methods to process the imbalanced diabetes dataset, balancing the data distribution. Then, Optuna is utilized to optimize the hyperparameters of the LightGBM model to enhance its performance. During the experiment, the effectiveness of the proposed methods is evaluated by comparing the training results of the dataset before and after balancing.

RESULTS

The experimental results show that the enhanced LightGBM-Optuna model improves the accuracy from 97.07% to 97.11%, and the precision from 97.17% to 98.99%. The time required for a single search is only 2.5 seconds. These results demonstrate the superiority of the proposed method in handling imbalanced datasets and optimizing model performance.

CONCLUSIONS

The study indicates that combining SMOTE and RUS data balancing algorithms with Optuna for hyperparameter optimization can effectively enhance machine learning models, especially in dealing with imbalanced datasets for diabetes prediction.

摘要

背景

糖尿病是一种严重影响公众健康的疾病,需要早期发现以便进行有效管理和干预。然而,不平衡数据集对准确的糖尿病预测构成了挑战。这种不平衡通常导致模型在预测少数类时表现不佳,影响整体诊断性能。

目的

为了解决这个问题,本研究结合使用 Synthetic Minority Over-sampling Technique (SMOTE) 和 Random Under-Sampling (RUS) 进行数据平衡,并使用 Optuna 进行机器学习模型的超参数优化。这种方法旨在填补当前数据平衡和模型优化研究中的空白,从而提高预测准确性和计算效率。

方法

首先,研究使用 SMOTE 和 RUS 方法处理不平衡的糖尿病数据集,平衡数据分布。然后,使用 Optuna 优化 LightGBM 模型的超参数,以提高其性能。在实验中,通过比较数据集在平衡前后的训练结果来评估所提出方法的有效性。

结果

实验结果表明,增强的 LightGBM-Optuna 模型将准确性从 97.07%提高到 97.11%,精度从 97.17%提高到 98.99%。单次搜索所需的时间仅为 2.5 秒。这些结果表明,所提出的方法在处理不平衡数据集和优化模型性能方面具有优越性。

结论

研究表明,结合 SMOTE 和 RUS 数据平衡算法以及 Optuna 进行超参数优化可以有效地增强机器学习模型,特别是在处理糖尿病预测中的不平衡数据集方面。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/ba688e1c5e6b/41387_2024_324_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/8a6614f43be5/41387_2024_324_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/2938630eae32/41387_2024_324_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/38d1e3f022fb/41387_2024_324_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/2cefb9e69f17/41387_2024_324_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/8301b361d9c2/41387_2024_324_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/d09a733d2bce/41387_2024_324_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/8a4c03479a1c/41387_2024_324_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/fd0ce06ae0a9/41387_2024_324_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/ba688e1c5e6b/41387_2024_324_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/8a6614f43be5/41387_2024_324_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/2938630eae32/41387_2024_324_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/38d1e3f022fb/41387_2024_324_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/2cefb9e69f17/41387_2024_324_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/8301b361d9c2/41387_2024_324_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/d09a733d2bce/41387_2024_324_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/8a4c03479a1c/41387_2024_324_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/fd0ce06ae0a9/41387_2024_324_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bf3/11324958/ba688e1c5e6b/41387_2024_324_Fig9_HTML.jpg

相似文献

1
Optimization of diabetes prediction methods based on combinatorial balancing algorithm.基于组合平衡算法的糖尿病预测方法优化。
Nutr Diabetes. 2024 Aug 14;14(1):63. doi: 10.1038/s41387-024-00324-z.
2
Feature group partitioning: an approach for depression severity prediction with class balancing using machine learning algorithms.特征分组划分:一种使用机器学习算法进行抑郁严重程度预测和类别平衡的方法。
BMC Med Res Methodol. 2024 Jun 3;24(1):123. doi: 10.1186/s12874-024-02249-8.
3
Predicting diabetes in adults: identifying important features in unbalanced data over a 5-year cohort study using machine learning algorithm.预测成年人糖尿病:使用机器学习算法在 5 年队列研究中识别不平衡数据中的重要特征。
BMC Med Res Methodol. 2024 Sep 27;24(1):220. doi: 10.1186/s12874-024-02341-z.
4
STB: synthetic minority oversampling technique for tree-boosting models for imbalanced datasets of intrusion detection systems.STB:用于入侵检测系统不平衡数据集的树增强模型的合成少数类过采样技术。
PeerJ Comput Sci. 2023 Nov 27;9:e1580. doi: 10.7717/peerj-cs.1580. eCollection 2023.
5
Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage.利用电子病历数据构建机器学习模型的联合建模策略:以脑出血为例。
BMC Med Inform Decis Mak. 2022 Oct 25;22(1):278. doi: 10.1186/s12911-022-02018-x.
6
A hybrid sampling algorithm combining synthetic minority over-sampling technique and edited nearest neighbor for missed abortion diagnosis.一种结合合成少数过采样技术和编辑最近邻的混合采样算法,用于诊断漏诊的流产。
BMC Med Inform Decis Mak. 2022 Dec 29;22(1):344. doi: 10.1186/s12911-022-02075-2.
7
Stroke Prediction with Machine Learning Methods among Older Chinese.基于机器学习方法对中国老年人进行中风预测。
Int J Environ Res Public Health. 2020 Mar 12;17(6):1828. doi: 10.3390/ijerph17061828.
8
Hospital mortality prediction in traumatic injuries patients: comparing different SMOTE-based machine learning algorithms.创伤性损伤患者的医院死亡率预测:比较不同基于 SMOTE 的机器学习算法。
BMC Med Res Methodol. 2023 Apr 22;23(1):101. doi: 10.1186/s12874-023-01920-w.
9
Prediction of diabetes disease using an ensemble of machine learning multi-classifier models.使用机器学习多分类器集成模型预测糖尿病疾病。
BMC Bioinformatics. 2023 Sep 12;24(1):337. doi: 10.1186/s12859-023-05465-z.
10
Adaptive Swarm Balancing Algorithms for rare-event prediction in imbalanced healthcare data.用于不平衡医疗数据中罕见事件预测的自适应群体平衡算法
PLoS One. 2017 Jul 28;12(7):e0180830. doi: 10.1371/journal.pone.0180830. eCollection 2017.

引用本文的文献

1
Application of IRSA-BP neural network in diagnosing diabetes.IRSA-BP神经网络在糖尿病诊断中的应用。
PLoS One. 2025 Jun 25;20(6):e0324759. doi: 10.1371/journal.pone.0324759. eCollection 2025.
2
Machine learning enhanced immunologic risk assessments for solid organ transplantation.机器学习增强实体器官移植的免疫风险评估。
Sci Rep. 2025 Mar 7;15(1):7943. doi: 10.1038/s41598-025-92147-w.

本文引用的文献

1
Harnessing machine learning to find synergistic combinations for FDA-approved cancer drugs.利用机器学习寻找 FDA 批准的癌症药物的协同组合。
Sci Rep. 2024 Jan 29;14(1):2428. doi: 10.1038/s41598-024-52814-w.
2
Optimizing classification of diseases through language model analysis of symptoms.通过对症状进行语言模型分析来优化疾病分类。
Sci Rep. 2024 Jan 17;14(1):1507. doi: 10.1038/s41598-024-51615-5.
3
Predicting female pelvic tilt and lumbar angle using machine learning in case of urinary incontinence and sexual dysfunction.
利用机器学习预测尿失禁和性功能障碍患者的女性骨盆倾斜度和腰椎角度。
Sci Rep. 2023 Oct 20;13(1):17940. doi: 10.1038/s41598-023-44964-0.
4
Utilizing convolutional neural networks to classify monkeypox skin lesions.利用卷积神经网络对猴痘皮肤损伤进行分类。
Sci Rep. 2023 Sep 3;13(1):14495. doi: 10.1038/s41598-023-41545-z.
5
Diabetes mellitus early warning and factor analysis using ensemble Bayesian networks with SMOTE-ENN and Boruta.基于 SMOTE-ENN 和 Boruta 的集成贝叶斯网络对糖尿病进行早期预警和因素分析。
Sci Rep. 2023 Aug 5;13(1):12718. doi: 10.1038/s41598-023-40036-5.
6
Machine learning modeling practices to support the principles of AI and ethics in nutrition research.支持营养研究中人工智能和伦理原则的机器学习建模实践。
Nutr Diabetes. 2022 Dec 2;12(1):48. doi: 10.1038/s41387-022-00226-y.
7
Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach.采用无监督机器学习方法鉴定和流行病学特征分析 2 型糖尿病亚人群。
Nutr Diabetes. 2022 May 27;12(1):27. doi: 10.1038/s41387-022-00206-2.
8
The global epidemics of diabetes in the 21st century: Current situation and perspectives.21 世纪全球糖尿病流行:现状与展望。
Eur J Prev Cardiol. 2019 Dec;26(2_suppl):7-14. doi: 10.1177/2047487319881021.
9
Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data.贝叶斯模型平均法:一种用于微阵列数据的改进型多类别、基因选择及分类工具的开发
Bioinformatics. 2005 May 15;21(10):2394-402. doi: 10.1093/bioinformatics/bti319. Epub 2005 Feb 15.