Zaher Moamen, Ghoneim Amr S, Abdelhamid Laila, Atia Ayman
Faculty of Computer Science, October University for Modern Sciences and Arts (MSA), Egypt; Human-Computer Interaction (HCI-LAB), Faculty of Computing and Artificial Intelligence, Helwan University, Egypt.
Computer Science Department, Faculty of Computing and Artificial Intelligence, Helwan University, Egypt.
Comput Biol Med. 2025 Jan;184:109399. doi: 10.1016/j.compbiomed.2024.109399. Epub 2024 Nov 27.
Physical rehabilitation plays a critical role in enhancing health outcomes globally. However, the shortage of physiotherapists, particularly in developing countries where the ratio is approximately ten physiotherapists per million people, poses a significant challenge to effective rehabilitation services. The existing literature on rehabilitation often falls short in data representation and the employment of diverse modalities, limiting the potential for advanced therapeutic interventions. To address this gap, This study integrates Computer Vision and Human Activity Recognition (HAR) technologies to support home-based rehabilitation. The study mitigates this gap by exploring various modalities and proposing a framework for data representation. We introduce a novel framework that leverages both Continuous Wavelet Transform (CWT) and Mel-Frequency Cepstral Coefficients (MFCC) for skeletal data representation. CWT is particularly valuable for capturing the time-frequency characteristics of dynamic movements involved in rehabilitation exercises, enabling a comprehensive depiction of both temporal and spectral features. This dual capability is crucial for accurately modelling the complex and variable nature of rehabilitation exercises. In our analysis, we evaluate 20 CNN-based models and one Vision Transformer (ViT) model. Additionally, we propose 12 hybrid architectures that combine CNN-based models with ViT in bi-model and tri-model configurations. These models are rigorously tested on the UI-PRMD and KIMORE benchmark datasets using key evaluation metrics, including accuracy, precision, recall, and F1-score, with 5-fold cross-validation. Our evaluation also considers real-time performance, model size, and efficiency on low-power devices, emphasising practical applicability. The proposed fused tri-model architectures outperform both single-architectures and bi-model configurations, demonstrating robust performance across both datasets and making the fused models the preferred choice for rehabilitation tasks. Our proposed hybrid model, DenMobVit, consistently surpasses state-of-the-art methods, achieving accuracy improvements of 2.9% and 1.97% on the UI-PRMD and KIMORE datasets, respectively. These findings highlight the effectiveness of our approach in advancing rehabilitation technologies and bridging the gap in physiotherapy services.
物理康复在全球改善健康结果方面发挥着关键作用。然而,物理治疗师短缺,尤其是在发展中国家,那里每百万人口中约有十名物理治疗师,这对有效的康复服务构成了重大挑战。现有关于康复的文献在数据呈现和多种模式的应用方面往往存在不足,限制了先进治疗干预的潜力。为了弥补这一差距,本研究整合了计算机视觉和人类活动识别(HAR)技术来支持居家康复。该研究通过探索各种模式并提出数据呈现框架来缩小这一差距。我们引入了一种新颖的框架,该框架利用连续小波变换(CWT)和梅尔频率倒谱系数(MFCC)来进行骨骼数据呈现。CWT对于捕捉康复锻炼中动态运动的时频特征特别有价值,能够全面描绘时间和频谱特征。这种双重能力对于准确模拟康复锻炼的复杂多变性质至关重要。在我们的分析中,我们评估了20个基于卷积神经网络(CNN)的模型和一个视觉Transformer(ViT)模型。此外,我们提出了12种混合架构,这些架构在双模型和三模型配置中将基于CNN的模型与ViT相结合。这些模型在UI-PRMD和KIMORE基准数据集上使用关键评估指标(包括准确率、精确率、召回率和F1分数)进行了严格测试,并采用了5折交叉验证。我们的评估还考虑了实时性能、模型大小以及在低功耗设备上的效率,强调实际适用性。所提出的融合三模型架构优于单模型和双模型配置,在两个数据集上都表现出强大的性能,使融合模型成为康复任务的首选。我们提出的混合模型DenMobVit始终超越现有最先进的方法,在UI-PRMD和KIMORE数据集上分别实现了2.9%和1.97%的准确率提升。这些发现突出了我们的方法在推进康复技术和弥合物理治疗服务差距方面的有效性。