Rivadulla Adrian R, Chen Xi, Cazzola Dario, Trewartha Grant, Preatoni Ezio
Department for Health, University of Bath, Bath, UK.
Department of Computer Science, University of Bath, Bath, UK.
J Biomech. 2024 Dec;177:112433. doi: 10.1016/j.jbiomech.2024.112433. Epub 2024 Nov 15.
Dimensionality reduction is a critical step for the efficacy and efficiency of clustering analysis. Despite the multiple available methods, biomechanists have often defaulted to Principal Component Analysis (PCA). We evaluated two PCA- and one autoencoder-based dimensionality reduction methods for their data compression and reconstruction capability, assessed their effect on the output of clustering runners' based on kinematics, and discussed their implications for the biomechanical assessment of running technique. Eighty-four participants completed a 4-minute run at 12 km/h while trunk and lower-limb kinematics were collected. Data reconstruction quality was assessed for Direct PCA (PCA directly on original variables) and Fourier PCA (modelling time series as Fourier series and then applying PCA) using popular variance explained criteria; and a feedforward autoencoder (AE). Agglomerative hierarchical clustering was then applied and the agreement between the resulting partitions was assessed. Meaningful errors in the reconstructed signals were found when applying popular variance explained criteria, suggesting reconstruction error should be assessed to make a more informed decision about how many components to retain for further analysis. Direct PCA, Fourier PCA and AE yielded different clusters, warranting caution when comparing outcomes from studies that use different dimensionality reduction techniques: each method may be sensitive to different data features. Direct PCA retaining 99 % of the original variance emerged as the best compromise of data compression, reconstruction quality and cluster separability in our dataset. We encourage biomechanists to experiment with diverse dimensionality reduction methods to optimise clustering outcomes and enhance the real-world applicability of their findings.
降维是聚类分析有效性和效率的关键步骤。尽管有多种可用方法,但生物力学专家通常会默认使用主成分分析(PCA)。我们评估了两种基于PCA和一种基于自动编码器的降维方法的数据压缩和重建能力,评估了它们对基于运动学的跑步者聚类输出的影响,并讨论了它们对跑步技术生物力学评估的意义。84名参与者以12公里/小时的速度完成了4分钟的跑步,同时收集了躯干和下肢的运动学数据。使用流行的方差解释标准,对直接PCA(直接对原始变量进行PCA)和傅里叶PCA(将时间序列建模为傅里叶级数,然后应用PCA)以及前馈自动编码器(AE)的数据重建质量进行了评估。然后应用凝聚层次聚类,并评估所得分区之间的一致性。在应用流行的方差解释标准时,在重建信号中发现了有意义的误差,这表明应该评估重建误差,以便就是否保留更多组件进行进一步分析做出更明智的决定。直接PCA、傅里叶PCA和AE产生了不同的聚类,在比较使用不同降维技术的研究结果时需要谨慎:每种方法可能对不同的数据特征敏感。在我们的数据集中,保留99%原始方差的直接PCA在数据压缩、重建质量和聚类可分离性方面是最佳折衷方案。我们鼓励生物力学专家尝试多种降维方法,以优化聚类结果并提高其研究结果在现实世界中的适用性。