使用自动编码器和自适应灰狼优化算法的改进型手语识别特征约简框架

Improved feature reduction framework for sign language recognition using autoencoders and adaptive Grey Wolf Optimization.

作者信息

Goel Rajeev, Bansal Sandhya, Gupta Kavita

机构信息

Government College, Naraingarh, India.

CSE Department, Maharishi Markandeshwar (Deemed to be) University, Mullana, India.

出版信息

Sci Rep. 2025 Jan 17;15(1):2300. doi: 10.1038/s41598-024-82785-x.

DOI:10.1038/s41598-024-82785-x

PMID:39824931

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11742416/

Abstract

Automatic Sign Language Recognition Systems (ASLR) offers smooth communication between hearing-impaired and normal-hearing individuals, enhancing educational opportunities for impaired. However, it struggles with "curse of dimensionality" due to excessive features resulting in prolonged training time and exhaustive computational demand. This paper proposes technique that integrates machine learning and swarm intelligence to effectively address this issue. The proposed technique, initially, extracts features using histrogram of gradient (HOG) approach and then reduces dimensions of extracted features using unsupervised autoencoder and subsequently refining the feature set with an improved GWO algorithm. A handcrafted artificial neural network serves as the classifier within this integrated framework, denoted as AEGWO-Net. Exhaustive experimentations were conducted on six different datasets namely ASL, ASL MNIST, ISL, ArSL, MNIST Digits, and IEEE-ISL containing gestures of different languages to demonstrate the performance of AEGWO-Net. The AEGWO-Net demonstrates superior performance improving accuracy and F1 score by 6% and 4% respectively compared to PCA-IGWO and KPCA-IGWO algorithms. Achieving high accuracy (98.40%), F1-score (96.59%), MCC (97.14%), and AUC (96.21%) indicates the robustness and generalizability of the AEGWO-Net method even with reduced dimensionality. Furthermore, a comparison between AEGWO-Net with other existing swarm intelligence techniques is also made to demonstrate its superiority.

摘要

自动手语识别系统（ASLR）为听力障碍者和听力正常者之间提供了顺畅的交流，增加了听力障碍者的教育机会。然而，由于特征过多，它面临着“维数灾难”，导致训练时间延长和计算需求过大。本文提出了一种将机器学习和群体智能相结合的技术来有效解决这个问题。所提出的技术首先使用梯度直方图（HOG）方法提取特征，然后使用无监督自动编码器降低提取特征的维度，随后用改进的灰狼优化算法（GWO）对特征集进行优化。一个手工制作的人工神经网络在这个集成框架中作为分类器，记为AEGWO-Net。在六个不同的数据集上进行了详尽的实验，这些数据集分别是ASL、ASL MNIST、ISL、ArSL、MNIST Digits和IEEE-ISL，包含不同语言的手势，以证明AEGWO-Net的性能。与PCA-IGWO和KPCA-IGWO算法相比，AEGWO-Net表现出卓越的性能，准确率和F1分数分别提高了6%和4%。即使在降维的情况下，AEGWO-Net方法也实现了高精度（98.40%）、F1分数（96.59%）、马修斯相关系数（MCC，97.14%）和曲线下面积（AUC，96.21%），表明了其鲁棒性和通用性。此外，还将AEGWO-Net与其他现有的群体智能技术进行了比较，以证明其优越性。