Jeon Eun Som, Som Anirudh, Shukla Ankita, Hasanaj Kristina, Buman Matthew P, Turaga Pavan
School of Arts, Media and Engineering and School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85281 USA.
Center for Vision Technologies Group at SRI International, Princeton, NJ 08540 USA.
IEEE Internet Things J. 2022 Jul 15;9(14):12848-12860. doi: 10.1109/jiot.2021.3139038. Epub 2021 Dec 29.
Deep neural networks are parametrized by several thousands or millions of parameters, and have shown tremendous success in many classification problems. However, the large number of parameters makes it difficult to integrate these models into edge devices such as smartphones and wearable devices. To address this problem, knowledge distillation (KD) has been widely employed, that uses a pre-trained high capacity network to train a much smaller network, suitable for edge devices. In this paper, for the first time, we study the applicability and challenges of using KD for time-series data for wearable devices. Successful application of KD requires specific choices of data augmentation methods during training. However, it is not yet known if there exists a coherent strategy for choosing an augmentation approach during KD. In this paper, we report the results of a detailed study that compares and contrasts various common choices and some hybrid data augmentation strategies in KD based human activity analysis. Research in this area is often limited as there are not many comprehensive databases available in the public domain from wearable devices. Our study considers databases from small scale publicly available to one derived from a large scale interventional study into human activity and sedentary behavior. We find that the choice of data augmentation techniques during KD have a variable level of impact on end performance, and find that the optimal network choice as well as data augmentation strategies are specific to a dataset at hand. However, we also conclude with a general set of recommendations that can provide a strong baseline performance across databases.
深度神经网络由数千或数百万个参数进行参数化,并且在许多分类问题中都取得了巨大成功。然而,大量的参数使得将这些模型集成到诸如智能手机和可穿戴设备等边缘设备中变得困难。为了解决这个问题,知识蒸馏(KD)已被广泛应用,它使用一个预训练的高容量网络来训练一个小得多的网络,该网络适用于边缘设备。在本文中,我们首次研究了将知识蒸馏用于可穿戴设备的时间序列数据的适用性和挑战。知识蒸馏的成功应用需要在训练期间对数据增强方法进行特定选择。然而,目前尚不清楚在知识蒸馏过程中是否存在一种连贯的策略来选择增强方法。在本文中,我们报告了一项详细研究的结果,该研究比较并对比了基于知识蒸馏的人类活动分析中的各种常见选择和一些混合数据增强策略。该领域的研究往往受到限制,因为在公共领域中可用于可穿戴设备的综合数据库并不多。我们的研究考虑了从小规模公开可用的数据库到源自大规模人类活动和久坐行为干预研究的数据库。我们发现,在知识蒸馏过程中数据增强技术的选择对最终性能有不同程度的影响,并且发现最佳网络选择以及数据增强策略因手头的数据集而异。然而,我们也给出了一组通用建议,这些建议可以在各个数据库中提供强大的基线性能。