St. Pölten University of Applied Sciences, Institute for Creative Media Technologies, St. Pölten, Austria.
St. Pölten University of Applied Sciences, Institute for Creative Media Technologies, St. Pölten, Austria.
Gait Posture. 2020 Feb;76:198-203. doi: 10.1016/j.gaitpost.2019.10.021. Epub 2019 Nov 9.
Quantitative gait analysis produces a vast amount of data, which can be difficult to analyze. Automated gait classification based on machine learning techniques bear the potential to support clinicians in comprehending these complex data. Even though these techniques are already frequently used in the scientific community, there is no clear consensus on how the data need to be preprocessed and arranged to assure optimal classification accuracy outcomes.
Is there an optimal data aggregation and preprocessing workflow to optimize classification accuracy outcomes?
Based on our previous work on automated classification of ground reaction force (GRF) data, a sequential setup was followed: firstly, several aggregation methods - early fusion and late fusion - were compared, and secondly, based on the best aggregation method identified, the expressiveness of different combinations of signal representations was investigated. The employed dataset included data from 910 subjects, with four gait disorder classes and one healthy control group. The machine learning pipeline comprised principle component analysis (PCA), z-standardization and a support vector machine (SVM).
The late fusion aggregation, i.e., utilizing majority voting on the classifier's predictions, performed best. In addition, the use of derived signal representations (relative changes and signal differences) seems to be advantageous as well.
Our results indicate that great caution is needed when data preprocessing and aggregation methods are selected, as these can have an impact on classification accuracies. These results shall serve future studies as a guideline for the choice of data aggregation and preprocessing techniques to be employed.
定量步态分析会产生大量数据,这些数据可能难以分析。基于机器学习技术的自动步态分类有可能帮助临床医生理解这些复杂的数据。尽管这些技术在科学界已经被广泛使用,但对于如何预处理和整理数据以确保最佳分类准确性结果,尚未达成明确共识。
是否存在一种最优的数据聚合和预处理工作流程,可以优化分类准确性结果?
基于我们之前在地面反力(GRF)数据自动分类方面的工作,采用了以下顺序设置:首先,比较了几种聚合方法——早期融合和晚期融合;其次,根据确定的最佳聚合方法,研究了不同信号表示组合的表达能力。所使用的数据集包括 910 名受试者的数据,涉及四个步态障碍类别和一个健康对照组。机器学习管道包括主成分分析(PCA)、z 标准化和支持向量机(SVM)。
晚期融合聚合(即在分类器的预测上进行多数投票)表现最佳。此外,使用派生信号表示(相对变化和信号差异)似乎也有优势。
我们的结果表明,在选择数据预处理和聚合方法时需要非常谨慎,因为这些方法会影响分类准确性。这些结果将为未来的研究提供指导,为选择要使用的数据聚合和预处理技术提供参考。