Nahin Shahriar Kabir, Acharjee Sanjay, Saha Sawradip, Das Aurick, Hossain Shahruk, Haque Mohammad Ariful
Department of Electrical and Electronic Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh.
Heliyon. 2024 Aug 25;10(17):e36823. doi: 10.1016/j.heliyon.2024.e36823. eCollection 2024 Sep 15.
Human Pose Estimation (HPE) is a crucial step towards understanding people in images and videos. HPE provides geometric and motion information of the human body, which has been applied to a wide range of applications (e.g., human-computer interaction, motion analysis, augmented reality, virtual reality, healthcare, etc.). An extremely useful task of this kind is the 2D pose estimation of bedridden patients from infrared (IR) images. Here, the IR imaging modality is preferred due to privacy concerns and the need for monitoring both uncovered and covered patients at different levels of illumination. The major drawback of this research problem is the unavailability of covered examples, which are very costly to collect and time-consuming to label. In this work, a deep learning-based framework was developed for human sleeping pose estimation on covered images using only the uncovered training images. In the training scheme, two different image augmentation techniques, a statistical approach as well as a GAN-based approach, were explored for domain adaptation, where the statistical approach performed better. The accuracy of the model trained on the statistically augmented dataset was improved by 124 % as compared with the model trained on non-augmented images. To handle the scarcity of training infrared images, a transfer learning strategy was used by pre-training the model on an RGB pose estimation dataset, resulting in a further increment in accuracy of 4 %. Semi-supervised learning techniques, with a novel pose discriminator model in the loop, were adopted to utilize the unannotated training data, resulting in a further 3 % increase in accuracy. Thus, significant improvement has been shown in the case of 2D pose estimation from infrared images, with a comparatively small amount of annotated data and a large amount of unannotated data by using the proposed training pipeline powered by heavy augmentation.
人体姿态估计(HPE)是理解图像和视频中人物的关键步骤。HPE提供人体的几何和运动信息,已被应用于广泛的应用领域(例如,人机交互、运动分析、增强现实、虚拟现实、医疗保健等)。这类极其有用的任务之一是从红外(IR)图像中对卧床患者进行二维姿态估计。在这里,由于隐私问题以及需要在不同光照水平下监测未覆盖和覆盖的患者,红外成像模式更受青睐。这个研究问题的主要缺点是难以获得覆盖状态下的示例,因为收集这些示例成本很高且标注耗时。在这项工作中,开发了一种基于深度学习的框架,用于仅使用未覆盖的训练图像对覆盖图像上的人体睡眠姿态进行估计。在训练方案中,探索了两种不同的图像增强技术,一种统计方法以及一种基于生成对抗网络(GAN)的方法用于域适应,其中统计方法表现更好。与在未增强图像上训练的模型相比,在统计增强数据集上训练的模型准确率提高了124%。为了处理训练红外图像的稀缺性,通过在RGB姿态估计数据集上对模型进行预训练来使用迁移学习策略,从而使准确率进一步提高了4%。采用带有新型姿态判别器模型的半监督学习技术来利用未标注的训练数据,使准确率又提高了3%。因此,通过使用由大量增强驱动的所提出的训练管道,在从红外图像进行二维姿态估计的情况下,使用相对少量的标注数据和大量未标注数据就显示出了显著的改进。