Cruciata Luca, Contino Salvatore, Ciccarelli Marianna, Pirrone Roberto, Mostarda Leonardo, Papetti Alessandra, Piangerelli Marco
Department of Engineering, University of Palermo, 90128 Palermo, Italy.
Department of Industrial Engineering and Mathematical Sciences, Poytechnic University of Marche, Via Brecce Bianche 12, 60131 Ancona, Italy.
Sensors (Basel). 2025 Aug 1;25(15):4750. doi: 10.3390/s25154750.
Work-related musculoskeletal disorders (WMSDs) are a leading concern in industrial ergonomics, often stemming from sustained non-neutral postures and repetitive tasks. This paper presents a vision-based framework for real-time, frame-level ergonomic risk classification using a lightweight Vision Transformer (ViT). The proposed system operates directly on raw RGB images without requiring skeleton reconstruction, joint angle estimation, or image segmentation. A single ViT model simultaneously classifies eight anatomical regions, enabling efficient multi-label posture assessment. Training is supervised using a multimodal dataset acquired from synchronized RGB video and full-body inertial motion capture, with ergonomic risk labels derived from RULA scores computed on joint kinematics. The system is validated on realistic, simulated industrial tasks that include common challenges such as occlusion and posture variability. Experimental results show that the ViT model achieves state-of-the-art performance, with F1-scores exceeding 0.99 and AUC values above 0.996 across all regions. Compared to previous CNN-based system, the proposed model improves classification accuracy and generalizability while reducing complexity and enabling real-time inference on edge devices. These findings demonstrate the model's potential for unobtrusive, scalable ergonomic risk monitoring in real-world manufacturing environments.
与工作相关的肌肉骨骼疾病(WMSDs)是工业工效学中一个主要关注点,通常源于持续的非中性姿势和重复性任务。本文提出了一个基于视觉的框架,用于使用轻量级视觉变换器(ViT)进行实时、帧级的工效学风险分类。所提出的系统直接对原始RGB图像进行操作,无需骨骼重建、关节角度估计或图像分割。单个ViT模型同时对八个解剖区域进行分类,实现高效的多标签姿势评估。使用从同步RGB视频和全身惯性运动捕捉获取的多模态数据集进行监督训练,工效学风险标签来自根据关节运动学计算的RULA分数。该系统在包括遮挡和姿势变异性等常见挑战的真实模拟工业任务上进行了验证。实验结果表明,ViT模型实现了先进的性能,所有区域的F1分数超过0.99,AUC值高于0.996。与以前基于CNN的系统相比,所提出的模型提高了分类准确性和通用性,同时降低了复杂性,并能够在边缘设备上进行实时推理。这些发现证明了该模型在现实世界制造环境中进行不显眼、可扩展的工效学风险监测的潜力。