Albattah Waleed, Khan Rehan Ullah
Department of Information Technology, College of Computer, Qassim University, Buraydah, Saudi Arabia.
Front Big Data. 2025 Mar 13;8:1455442. doi: 10.3389/fdata.2025.1455442. eCollection 2025.
The exponential growth of image and video data motivates the need for practical real-time content-based searching algorithms. Features play a vital role in identifying objects within images. However, feature-based classification faces a challenge due to uneven class instance distribution. Ideally, each class should have an equal number of instances and features to ensure optimal classifier performance. However, real-world scenarios often exhibit class imbalances. Thus, this article explores the classification framework based on image features, analyzing balanced and imbalanced distributions. Through extensive experimentation, we examine the impact of class imbalance on image classification performance, primarily on large datasets. The comprehensive evaluation shows that all models perform better with balancing compared to using an imbalanced dataset, underscoring the importance of dataset balancing for model accuracy. Distributed Gaussian (D-GA) and Distributed Poisson (D-PO) are found to be the most effective techniques, especially in improving Random Forest (RF) and SVM models. The deep learning experiments also show an improvement as such.
图像和视频数据的指数级增长推动了对实用的基于内容的实时搜索算法的需求。特征在识别图像中的物体方面起着至关重要的作用。然而,由于类实例分布不均衡,基于特征的分类面临挑战。理想情况下,每个类应该具有相等数量的实例和特征,以确保分类器的最佳性能。然而,现实世界的场景往往存在类不平衡的情况。因此,本文探讨了基于图像特征的分类框架,分析了平衡和不平衡分布。通过广泛的实验,我们研究了类不平衡对图像分类性能的影响,主要是在大型数据集上。综合评估表明,与使用不平衡数据集相比,所有模型在进行平衡处理时表现更好,这突出了数据集平衡对模型准确性的重要性。发现分布式高斯(D-GA)和分布式泊松(D-PO)是最有效的技术,特别是在改进随机森林(RF)和支持向量机(SVM)模型方面。深度学习实验也显示出了类似的改进。