• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

不平衡特征对大型数据集的影响。

Impact of imbalanced features on large datasets.

作者信息

Albattah Waleed, Khan Rehan Ullah

机构信息

Department of Information Technology, College of Computer, Qassim University, Buraydah, Saudi Arabia.

出版信息

Front Big Data. 2025 Mar 13;8:1455442. doi: 10.3389/fdata.2025.1455442. eCollection 2025.

DOI:10.3389/fdata.2025.1455442
PMID:40151465
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11948280/
Abstract

The exponential growth of image and video data motivates the need for practical real-time content-based searching algorithms. Features play a vital role in identifying objects within images. However, feature-based classification faces a challenge due to uneven class instance distribution. Ideally, each class should have an equal number of instances and features to ensure optimal classifier performance. However, real-world scenarios often exhibit class imbalances. Thus, this article explores the classification framework based on image features, analyzing balanced and imbalanced distributions. Through extensive experimentation, we examine the impact of class imbalance on image classification performance, primarily on large datasets. The comprehensive evaluation shows that all models perform better with balancing compared to using an imbalanced dataset, underscoring the importance of dataset balancing for model accuracy. Distributed Gaussian (D-GA) and Distributed Poisson (D-PO) are found to be the most effective techniques, especially in improving Random Forest (RF) and SVM models. The deep learning experiments also show an improvement as such.

摘要

图像和视频数据的指数级增长推动了对实用的基于内容的实时搜索算法的需求。特征在识别图像中的物体方面起着至关重要的作用。然而,由于类实例分布不均衡,基于特征的分类面临挑战。理想情况下,每个类应该具有相等数量的实例和特征,以确保分类器的最佳性能。然而,现实世界的场景往往存在类不平衡的情况。因此,本文探讨了基于图像特征的分类框架,分析了平衡和不平衡分布。通过广泛的实验,我们研究了类不平衡对图像分类性能的影响,主要是在大型数据集上。综合评估表明,与使用不平衡数据集相比,所有模型在进行平衡处理时表现更好,这突出了数据集平衡对模型准确性的重要性。发现分布式高斯(D-GA)和分布式泊松(D-PO)是最有效的技术,特别是在改进随机森林(RF)和支持向量机(SVM)模型方面。深度学习实验也显示出了类似的改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/4e275ac4228c/fdata-08-1455442-g0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/86cd24bb14f5/fdata-08-1455442-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/cf5299d04744/fdata-08-1455442-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/b075a9a822d2/fdata-08-1455442-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/5af7608afbb3/fdata-08-1455442-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/f1402f64a846/fdata-08-1455442-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/0a7ad9456b03/fdata-08-1455442-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/47adf0656707/fdata-08-1455442-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/2899e822b05f/fdata-08-1455442-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/55999da33f2e/fdata-08-1455442-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/4e275ac4228c/fdata-08-1455442-g0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/86cd24bb14f5/fdata-08-1455442-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/cf5299d04744/fdata-08-1455442-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/b075a9a822d2/fdata-08-1455442-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/5af7608afbb3/fdata-08-1455442-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/f1402f64a846/fdata-08-1455442-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/0a7ad9456b03/fdata-08-1455442-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/47adf0656707/fdata-08-1455442-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/2899e822b05f/fdata-08-1455442-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/55999da33f2e/fdata-08-1455442-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f4a/11948280/4e275ac4228c/fdata-08-1455442-g0010.jpg

相似文献

1
Impact of imbalanced features on large datasets.不平衡特征对大型数据集的影响。
Front Big Data. 2025 Mar 13;8:1455442. doi: 10.3389/fdata.2025.1455442. eCollection 2025.
2
Addressing Imbalanced Classification Problems in Drug Discovery and Development Using Random Forest, Support Vector Machine, AutoGluon-Tabular, and H2O AutoML.使用随机森林、支持向量机、AutoGluon-Tabular和H2O自动机器学习解决药物发现与开发中的不平衡分类问题。
J Chem Inf Model. 2025 Apr 28;65(8):3976-3989. doi: 10.1021/acs.jcim.5c00023. Epub 2025 Apr 15.
3
Data Augmentation and Machine Learning algorithms for multi-class imbalanced morphometrics data of stingless bees.用于无刺蜂多类不平衡形态测量数据的数据增强和机器学习算法
Heliyon. 2025 Jan 23;11(3):e42214. doi: 10.1016/j.heliyon.2025.e42214. eCollection 2025 Feb 15.
4
A multi-instance tumor subtype classification method for small PET datasets using RA-DL attention module guided deep feature extraction with radiomics features.基于 RA-DL 注意力模块引导的放射组学特征深度特征提取的小 PET 数据集多实例肿瘤亚型分类方法。
Comput Biol Med. 2024 May;174:108461. doi: 10.1016/j.compbiomed.2024.108461. Epub 2024 Apr 9.
5
Addressing Binary Classification over Class Imbalanced Clinical Datasets Using Computationally Intelligent Techniques.使用计算智能技术处理类不平衡临床数据集上的二元分类问题。
Healthcare (Basel). 2022 Jul 13;10(7):1293. doi: 10.3390/healthcare10071293.
6
Improving Surgical Site Infection Prediction Using Machine Learning: Addressing Challenges of Highly Imbalanced Data.使用机器学习改善手术部位感染预测:应对高度不平衡数据的挑战。
Diagnostics (Basel). 2025 Feb 19;15(4):501. doi: 10.3390/diagnostics15040501.
7
Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets.基于结构-活性关系的高度不平衡Tox21数据集的化学分类
J Cheminform. 2020 Oct 27;12(1):66. doi: 10.1186/s13321-020-00468-x.
8
A medical image classification method based on self-regularized adversarial learning.基于自正则化对抗学习的医学图像分类方法。
Med Phys. 2024 Nov;51(11):8232-8246. doi: 10.1002/mp.17320. Epub 2024 Jul 30.
9
Convolutional Rebalancing Network for the Classification of Large Imbalanced Rice Pest and Disease Datasets in the Field.用于田间大失衡水稻病虫害数据集分类的卷积重平衡网络
Front Plant Sci. 2021 Jul 5;12:671134. doi: 10.3389/fpls.2021.671134. eCollection 2021.
10
Interaction effect between data discretization and data resampling for class-imbalanced medical datasets.类别不均衡医学数据集的数据离散化与数据重采样之间的交互作用。
Technol Health Care. 2025 Mar;33(2):1000-1013. doi: 10.1177/09287329241295874. Epub 2024 Nov 25.

本文引用的文献

1
Dynamic learning for imbalanced data in learning chest X-ray and CT images.用于胸部X光和CT图像学习中不平衡数据的动态学习
Heliyon. 2023 Jun 1;9(6):e16807. doi: 10.1016/j.heliyon.2023.e16807. eCollection 2023 Jun.
2
A systematic study of the class imbalance problem in convolutional neural networks.卷积神经网络中类不平衡问题的系统研究。
Neural Netw. 2018 Oct;106:249-259. doi: 10.1016/j.neunet.2018.07.011. Epub 2018 Jul 29.
3
A survey on deep learning in medical image analysis.深度学习在医学图像分析中的应用研究综述。
Med Image Anal. 2017 Dec;42:60-88. doi: 10.1016/j.media.2017.07.005. Epub 2017 Jul 26.
4
Deep learning.深度学习。
Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.
5
Exploratory undersampling for class-imbalance learning.用于类别不平衡学习的探索性欠采样
IEEE Trans Syst Man Cybern B Cybern. 2009 Apr;39(2):539-50. doi: 10.1109/TSMCB.2008.2007853. Epub 2008 Dec 16.