Department of Kinesiology and Physical Activity, McGill University, Montreal, Québec, Canada.
J Biomech. 2024 May;168:112116. doi: 10.1016/j.jbiomech.2024.112116. Epub 2024 Apr 24.
Time-series data are common in biomechanical studies. These data often undergo pre-processing steps such as time normalization or filtering prior to use in further analyses, including deep-learning classification. In this context, it remains unclear how these preprocessing steps affect deep-learning model performance. Thus, the aim of this study is to assess the effect of time-normalization and filtering on the performance of deep-learning classification models. We also investigated the effect of amplitude scaling. Using a public dataset (Gutenburg Gait Database, a ground reaction force database of level overground walking at self-selected walking speed involving 350 healthy individuals), we trained convolutional neural network (CNN) and long short-term memory (LSTM) models to predict binary sex (male, female) using three-dimensional ground-reaction forces to which we applied different processing approaches: zero padding, interpolation to 100% of signal, filtering, and scaling (min-max, body mass). The results show that transformations resulted in differences in model performances. Highest performance was obtained using unfiltered data, zero-padding, and min-max amplitude scaling (F1-score of 91 and 87% for CNN and LSTM, respectively). Not filtering data and using min-max scaling generally improve performance for both model architectures. For interpolation, results are not consistent across model architectures. This study suggests that processing steps must be considered in applications where deep-learning classification performance is relevant.
时间序列数据在生物力学研究中很常见。这些数据通常在用于进一步分析(包括深度学习分类)之前需要经过时间归一化或滤波等预处理步骤。在这种情况下,这些预处理步骤如何影响深度学习模型的性能尚不清楚。因此,本研究旨在评估时间归一化和滤波对深度学习分类模型性能的影响。我们还研究了幅度缩放的影响。使用一个公共数据集(Gutenberg 步态数据库,一个涉及 350 名健康个体的水平地面行走的地面反力数据库,以自身选择的行走速度进行),我们使用三维地面反力训练卷积神经网络(CNN)和长短时记忆(LSTM)模型,对二进制性别(男性、女性)进行预测,我们对这些数据应用了不同的处理方法:零填充、插值到信号的 100%、滤波和缩放(最小-最大、体重)。结果表明,变换导致模型性能的差异。未过滤数据、零填充和最小-最大幅度缩放的性能最高(CNN 和 LSTM 的 F1 分数分别为 91%和 87%)。不滤波数据和使用最小-最大缩放通常可以提高两种模型结构的性能。对于插值,结果在模型结构之间不一致。本研究表明,在深度学习分类性能相关的应用中,必须考虑处理步骤。