Jaén-Vargas Milagros, Reyes Leiva Karla Miriam, Fernandes Francisco, Barroso Gonçalves Sérgio, Tavares Silva Miguel, Lopes Daniel Simões, Serrano Olmedo José Javier
Bioinstrumentation and Nanomedicine Laboratory, Center for Biomedical Technology, Universidad Politécnica de Madrid, Madrid, Spain.
Engineering Faculty, Universidad Tecnológica Centroamericana, San Pedro Sula, Honduras.
PeerJ Comput Sci. 2022 Aug 8;8:e1052. doi: 10.7717/peerj-cs.1052. eCollection 2022.
Deep learning (DL) models are very useful for human activity recognition (HAR); these methods present better accuracy for HAR when compared to traditional, among other advantages. DL learns from unlabeled data and extracts features from raw data, as for the case of time-series acceleration. Sliding windows is a feature extraction technique. When used for preprocessing time-series data, it provides an improvement in accuracy, latency, and cost of processing. The time and cost of preprocessing can be beneficial especially if the window size is small, but how small can this window be to keep good accuracy? The objective of this research was to analyze the performance of four DL models: a simple deep neural network (DNN); a convolutional neural network (CNN); a long short-term memory network (LSTM); and a hybrid model (CNN-LSTM), when variating the sliding window size using fixed overlapped windows to identify an optimal window size for HAR. We compare the effects in two acceleration sources': wearable inertial measurement unit sensors (IMU) and motion caption systems (MOCAP). Moreover, short sliding windows of sizes 5, 10, 15, 20, and 25 frames to long ones of sizes 50, 75, 100, and 200 frames were compared. The models were fed using raw acceleration data acquired in experimental conditions for three activities: walking, sit-to-stand, and squatting. Results show that the most optimal window is from 20-25 frames (0.20-0.25s) for both sources, providing an accuracy of 99,07% and F1-score of 87,08% in the (CNN-LSTM) using the wearable sensors data, and accuracy of 98,8% and F1-score of 82,80% using MOCAP data; similar accurate results were obtained with the LSTM model. There is almost no difference in accuracy in larger frames (100, 200). However, smaller windows present a decrease in the F1-score. In regard to inference time, data with a sliding window of 20 frames can be preprocessed around 4x (LSTM) and 2x (CNN-LSTM) times faster than data using 100 frames.
深度学习(DL)模型在人类活动识别(HAR)中非常有用;与传统方法相比,这些方法在HAR方面具有更高的准确性等优势。DL从无标签数据中学习并从原始数据中提取特征,如时间序列加速度的情况。滑动窗口是一种特征提取技术。当用于预处理时间序列数据时,它可以提高准确性、降低延迟并减少处理成本。预处理的时间和成本可能是有益的,特别是如果窗口大小较小,但这个窗口可以有多小才能保持良好的准确性呢?本研究的目的是分析四种DL模型的性能:一个简单的深度神经网络(DNN);一个卷积神经网络(CNN);一个长短期记忆网络(LSTM);以及一个混合模型(CNN-LSTM),当使用固定重叠窗口改变滑动窗口大小时,以确定HAR的最佳窗口大小。我们比较了两种加速度源的影响:可穿戴惯性测量单元传感器(IMU)和运动捕捉系统(MOCAP)。此外,还比较了大小为5、10、15、20和25帧的短滑动窗口与大小为50、75、100和200帧的长滑动窗口。使用在实验条件下获取的三种活动(行走、从坐姿到站姿、下蹲)的原始加速度数据对模型进行训练。结果表明,对于这两种数据源,最优化的窗口大小为20 - 25帧(0.20 - 0.25秒),在使用可穿戴传感器数据的(CNN-LSTM)模型中,准确率为99.07%,F1分数为87.08%;使用MOCAP数据时,准确率为98.8%,F1分数为82.80%;使用LSTM模型也获得了类似的准确结果。在较大的帧(100、200)中,准确率几乎没有差异。然而,较小的窗口会导致F1分数下降。关于推理时间,大小为20帧的滑动窗口的数据预处理速度比使用100帧的数据快约4倍(LSTM)和2倍(CNN-LSTM)。