Department of Computer Science, National University of Computer and Emerging Sciences, Islamabad, Pakistan.
Department of Computer Science, National University of Computer and Emerging Sciences, Lahore, Pakistan.
PLoS One. 2023 Jun 7;18(6):e0286919. doi: 10.1371/journal.pone.0286919. eCollection 2023.
With the advancement of ubiquitous computing, smartphone sensors are generating a vast amount of unlabeled data streams ubiquitously. This sensor data can potentially help to recognize various behavioral contexts in the natural environment. Accurate behavioral context recognition has a wide variety of applications in many domains like disease prevention and independent living. However, despite the availability of enormous amounts of sensor data, label acquisition, due to its dependence on users, is still a challenging task. In this work, we propose a novel context recognition approach i.e., Dissimilarity-Based Query Strategy (DBQS). Our approach DBQS leverages Active Learning based selective sampling to find the informative and diverse samples in the sensor data to train the model. Our approach overcomes the stagnation problem by considering only new and distinct samples from the pool that were not previously explored. Further, our model exploits temporal information in the data in order to further maintain diversity in the dataset. The key intuition behind the proposed approach is that the variations during the learning phase will train the model in diverse settings and it will outperform when assigned a context recognition task in the natural setting. Experimentation on a publicly available natural environment dataset demonstrates that our proposed approach improved overall average Balanced Accuracy(BA) by 6% with an overall 13% less training data requirement.
随着普及计算的发展,智能手机传感器正在广泛地生成大量未标记的数据流。这些传感器数据有可能帮助我们在自然环境中识别各种行为背景。准确的行为背景识别在许多领域都有广泛的应用,如疾病预防和独立生活。然而,尽管有大量的传感器数据,由于依赖于用户,标签获取仍然是一项具有挑战性的任务。在这项工作中,我们提出了一种新颖的上下文识别方法,即基于差异的查询策略(DBQS)。我们的方法 DBQS 利用基于主动学习的选择性采样在传感器数据中找到信息丰富和多样化的样本进行模型训练。我们的方法通过仅考虑从以前未探索过的池中选择的新的和不同的样本,克服了停滞问题。此外,我们的模型利用数据中的时间信息,以进一步保持数据集的多样性。所提出方法的关键思想是,学习阶段的变化将在不同的环境中训练模型,并且当它被分配到自然环境中的上下文识别任务时,它将表现得更好。在公开的自然环境数据集上进行的实验表明,我们的方法将整体平均平衡准确率(BA)提高了 6%,同时整体减少了 13%的训练数据需求。