Woods Luke T, Rana Zeeshan A
Digital Aviation Research and Technology Centre (DARTeC), Cranfield University, Cranfield, Bedfordshire MK43 0AL, UK.
Leidos Industrial Engineers Limited, Unit 3, Bedford Link Logistics Park, Bell Farm Way, Kempston, Bedfordshire MK43 9SS, UK.
J Imaging. 2023 Nov 2;9(11):238. doi: 10.3390/jimaging9110238.
Supervised deep learning models can be optimised by applying regularisation techniques to reduce overfitting, which can prove difficult when fine tuning the associated hyperparameters. Not all hyperparameters are equal, and understanding the effect each hyperparameter and regularisation technique has on the performance of a given model is of paramount importance in research. We present the first comprehensive, large-scale ablation study for an encoder-only transformer to model sign language using the improved Word-level American Sign Language dataset (WLASL-alt) and human pose estimation keypoint data, with a view to put constraints on the potential to optimise the task. We measure the impact a range of model parameter regularisation and data augmentation techniques have on sign classification accuracy. We demonstrate that within the quoted uncertainties, other than ℓ2 parameter regularisation, none of the regularisation techniques we employ have an appreciable positive impact on performance, which we find to be in contradiction to results reported by other similar, albeit smaller scale, studies. We also demonstrate that the model architecture is bounded by the small dataset size for this task over finding an appropriate set of model parameter regularisation and common or basic dataset augmentation techniques. Furthermore, using the base model configuration, we report a new maximum top-1 classification accuracy of 84% on 100 signs, thereby improving on the previous benchmark result for this model architecture and dataset.
监督式深度学习模型可以通过应用正则化技术来优化,以减少过拟合,而在微调相关超参数时,这可能会很困难。并非所有超参数都是等同的,在研究中,了解每个超参数和正则化技术对给定模型性能的影响至关重要。我们针对仅编码器的变压器提出了首个全面、大规模的消融研究,使用改进的词级美国手语数据集(WLASL-alt)和人体姿态估计关键点数据来对手语进行建模,旨在对优化该任务的潜力加以限制。我们测量了一系列模型参数正则化和数据增强技术对手语分类准确率的影响。我们证明,在所述的不确定性范围内,除了ℓ2参数正则化之外,我们采用的正则化技术均未对性能产生明显的积极影响,我们发现这与其他类似但规模较小的研究所报告的结果相矛盾。我们还证明,对于此任务,模型架构受小数据集大小的限制,而不是找到一组合适的模型参数正则化和常见或基本的数据集增强技术。此外,使用基础模型配置,我们报告了在100个手语上的新的最高top-1分类准确率为84%,从而改进了该模型架构和数据集的先前基准结果。