• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

处理机器学习中不平衡数据集是否能提高系统性能?:糖尿病预测案例。

Is handling unbalanced datasets for machine learning uplifts system performance?: A case of diabetic prediction.

机构信息

Department of Computer Engineering, Datta Meghe College of Engineering, Navi Mumbai, Pin Code: 400 708, India.

出版信息

Diabetes Metab Syndr. 2022 Sep;16(9):102609. doi: 10.1016/j.dsx.2022.102609. Epub 2022 Sep 5.

DOI:10.1016/j.dsx.2022.102609
PMID:36099677
Abstract

BACKGROUND AND AIMS

Healthcare is a sensitive sector, and addressing the class imbalance in the healthcare domain is a time-consuming task for machine learning-based systems due to the vast amount of data. This study looks into the impact of socioeconomic disparities on the healthcare data of diabetic patients to make accurate disease predictions.

METHODS

This study proposed a systematic approach of Closest Distance Ranking and Principal Component Analysis to deal with the unbalanced dataset. A typical machine learning technique was used to analyze the proposed approach. The data set of pregnant diabetic women is analysed for accurate detection.

RESULTS

The results of the case are analysed using sensitivity, which demonstrates that the minority class's lack of information makes it impossible to forecast the results. On the other hand, the unbalanced dataset was treated using the proposed technique and evaluated with the machine learning algorithm which significantly increased the performance of the system.

CONCLUSION

The performance of the machine learning-based system was significantly enhanced by the unbalanced dataset which was processed with the proposed technique and evaluated with the machine learning algorithm. For the first time, an unbalanced dataset was treated with a combination of Closest Distance Ranking and Principal Component Analysis.

摘要

背景和目的

医疗保健是一个敏感的领域,由于数据量庞大,基于机器学习的系统在解决医疗保健领域的类别不平衡问题方面是一项耗时的任务。本研究探讨了社会经济差异对糖尿病患者医疗保健数据的影响,以做出准确的疾病预测。

方法

本研究提出了一种最接近距离排序和主成分分析的系统方法来处理不平衡数据集。使用一种典型的机器学习技术来分析所提出的方法。对妊娠糖尿病妇女的数据进行分析,以进行准确的检测。

结果

使用敏感性分析了案例结果,表明少数群体缺乏信息使得无法预测结果。另一方面,使用所提出的技术处理了不平衡数据集,并使用机器学习算法进行了评估,这显著提高了系统的性能。

结论

使用所提出的技术处理不平衡数据集,并使用机器学习算法进行评估,显著提高了基于机器学习的系统的性能。首次使用最接近距离排序和主成分分析的组合来处理不平衡数据集。

相似文献

1
Is handling unbalanced datasets for machine learning uplifts system performance?: A case of diabetic prediction.处理机器学习中不平衡数据集是否能提高系统性能?:糖尿病预测案例。
Diabetes Metab Syndr. 2022 Sep;16(9):102609. doi: 10.1016/j.dsx.2022.102609. Epub 2022 Sep 5.
2
Gray wolf optimization-extreme learning machine approach for diabetic retinopathy detection.基于灰狼优化-极限学习机的糖尿病视网膜病变检测方法。
Front Public Health. 2022 Aug 1;10:925901. doi: 10.3389/fpubh.2022.925901. eCollection 2022.
3
Optimization of diabetes prediction methods based on combinatorial balancing algorithm.基于组合平衡算法的糖尿病预测方法优化。
Nutr Diabetes. 2024 Aug 14;14(1):63. doi: 10.1038/s41387-024-00324-z.
4
Soft Clustering for Enhancing the Diagnosis of Chronic Diseases over Machine Learning Algorithms.基于机器学习算法的软聚类在慢性病诊断中的应用。
J Healthc Eng. 2020 Mar 9;2020:4984967. doi: 10.1155/2020/4984967. eCollection 2020.
5
Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.将机器学习中的手工特征与潜在变量相结合,以预测放射性肺损伤。
Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.
6
Diabetes mellitus risk prediction in the presence of class imbalance using flexible machine learning methods.基于灵活机器学习方法的类别不平衡环境下的糖尿病风险预测。
BMC Med Inform Decis Mak. 2022 Feb 10;22(1):36. doi: 10.1186/s12911-022-01775-z.
7
A Smart Healthcare Recommendation System for Multidisciplinary Diabetes Patients with Data Fusion Based on Deep Ensemble Learning.基于深度集成学习的数据融合的多学科糖尿病患者智能医疗推荐系统。
Comput Intell Neurosci. 2021 Sep 17;2021:4243700. doi: 10.1155/2021/4243700. eCollection 2021.
8
Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics.多水平机器学习分类器比较及其性能指标。
Molecules. 2019 Aug 1;24(15):2811. doi: 10.3390/molecules24152811.
9
AutoScore-Imbalance: An interpretable machine learning tool for development of clinical scores with rare events data.AutoScore-Imbalance:一种具有罕见事件数据的临床评分开发的可解释机器学习工具。
J Biomed Inform. 2022 May;129:104072. doi: 10.1016/j.jbi.2022.104072. Epub 2022 Apr 11.
10
A soft computing approach for diabetes disease classification.基于软计算的糖尿病疾病分类方法。
Health Informatics J. 2018 Dec;24(4):379-393. doi: 10.1177/1460458216675500. Epub 2016 Nov 14.

引用本文的文献

1
Artificial Intelligence in Gestational Diabetes Care: A Systematic Review.人工智能在妊娠期糖尿病护理中的应用:一项系统综述。
J Diabetes Sci Technol. 2025 Aug 25:19322968251355967. doi: 10.1177/19322968251355967.
2
Embracing technological revolution: A panorama of machine learning in dentistry.拥抱技术革命:机器学习在牙科领域的全景图。
Med Oral Patol Oral Cir Bucal. 2024 Nov 1;29(6):e742-e749. doi: 10.4317/medoral.26679.
3
Utilizing Large Language Models to Generate Synthetic Data to Increase the Performance of BERT-Based Neural Networks.
利用大语言模型生成合成数据以提高基于BERT的神经网络的性能。
AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:429-438. eCollection 2024.