Computer Science Research Institute, Ulster University, Newtownabbey BT370QB, UK.
Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, 97187 Luleå, Sweden.
Sensors (Basel). 2018 Jul 9;18(7):2203. doi: 10.3390/s18072203.
Data annotation is a time-consuming process posing major limitations to the development of Human Activity Recognition (HAR) systems. The availability of a large amount of labeled data is required for supervised Machine Learning (ML) approaches, especially in the case of online and personalized approaches requiring user specific datasets to be labeled. The availability of such datasets has the potential to help address common problems of smartphone-based HAR, such as inter-person variability. In this work, we present (i) an automatic labeling method facilitating the collection of labeled datasets in free-living conditions using the smartphone, and (ii) we investigate the robustness of common supervised classification approaches under instances of noisy data. We evaluated the results with a dataset consisting of 38 days of manually labeled data collected in free living. The comparison between the manually and the automatically labeled ground truth demonstrated that it was possible to obtain labels automatically with an 80⁻85% average precision rate. Results obtained also show how a supervised approach trained using automatically generated labels achieved an 84% f-score (using Neural Networks and Random Forests); however, results also demonstrated how the presence of label noise could lower the f-score up to 64⁻74% depending on the classification approach (Nearest Centroid and Multi-Class Support Vector Machine).
数据标注是一个耗时的过程,这对人类活动识别 (HAR) 系统的发展构成了重大限制。监督机器学习 (ML) 方法需要大量的标记数据,特别是在需要对用户特定数据集进行标记的在线和个性化方法的情况下。此类数据集的可用性有可能有助于解决基于智能手机的 HAR 的常见问题,例如人与人之间的可变性。在这项工作中,我们提出了 (i) 一种自动标记方法,该方法使用智能手机在自由生活条件下方便地收集标记数据集,以及 (ii) 我们研究了常见监督分类方法在存在噪声数据情况下的稳健性。我们使用在自由生活中收集的 38 天手动标记数据的数据集评估了结果。手动和自动标记的地面实况之间的比较表明,使用 80-85%的平均精度率可以自动获得标签。获得的结果还表明,使用自动生成的标签训练的监督方法如何实现 84%的 f-score(使用神经网络和随机森林);然而,结果还表明,标签噪声的存在如何根据分类方法(最近中心和多类支持向量机)将 f-score降低到 64-74%。