Faculty of Information Technology and Electrical Engineering, University of Oulu, Oulu, Finland.
JMIR Mhealth Uhealth. 2021 Jan 28;9(1):e21926. doi: 10.2196/21926.
Multimodal wearable technologies have brought forward wide possibilities in human activity recognition, and more specifically personalized monitoring of eating habits. The emerging challenge now is the selection of most discriminative information from high-dimensional data collected from multiple sources. The available fusion algorithms with their complex structure are poorly adopted to the computationally constrained environment which requires integrating information directly at the source. As a result, more simple low-level fusion methods are needed.
In the absence of a data combining process, the cost of directly applying high-dimensional raw data to a deep classifier would be computationally expensive with regard to the response time, energy consumption, and memory requirement. Taking this into account, we aimed to develop a data fusion technique in a computationally efficient way to achieve a more comprehensive insight of human activity dynamics in a lower dimension. The major objective was considering statistical dependency of multisensory data and exploring intermodality correlation patterns for different activities.
In this technique, the information in time (regardless of the number of sources) is transformed into a 2D space that facilitates classification of eating episodes from others. This is based on a hypothesis that data captured by various sensors are statistically associated with each other and the covariance matrix of all these signals has a unique distribution correlated with each activity which can be encoded on a contour representation. These representations are then used as input of a deep model to learn specific patterns associated with specific activity.
In order to show the generalizability of the proposed fusion algorithm, 2 different scenarios were taken into account. These scenarios were different in terms of temporal segment size, type of activity, wearable device, subjects, and deep learning architecture. The first scenario used a data set in which a single participant performed a limited number of activities while wearing the Empatica E4 wristband. In the second scenario, a data set related to the activities of daily living was used where 10 different participants wore inertial measurement units while performing a more complex set of activities. The precision metric obtained from leave-one-subject-out cross-validation for the second scenario reached 0.803. The impact of missing data on performance degradation was also evaluated.
To conclude, the proposed fusion technique provides the possibility of embedding joint variability information over different modalities in just a single 2D representation which results in obtaining a more global view of different aspects of daily human activities at hand, and yet preserving the desired performance level in activity recognition.
多模态可穿戴技术在人类活动识别方面带来了广泛的可能性,特别是在个性化饮食习惯监测方面。现在面临的新兴挑战是从多个来源收集的高维数据中选择最具判别力的信息。现有的融合算法结构复杂,难以适应计算受限的环境,需要在源头上直接整合信息。因此,需要更简单的低层次融合方法。
在没有数据组合过程的情况下,直接将高维原始数据应用于深度分类器的成本将非常高,无论是在响应时间、能耗还是内存需求方面。考虑到这一点,我们旨在开发一种计算高效的数据融合技术,以便在更低的维度上更全面地了解人类活动的动态。主要目标是考虑多模态数据的统计相关性,并探索不同活动之间的模态间相关性模式。
在该技术中,时间信息(无论来源数量如何)被转换为二维空间,这有助于将进食事件与其他事件区分开来。这基于这样一个假设,即各种传感器捕获的数据在统计上彼此相关,并且所有这些信号的协方差矩阵具有与每个活动相关的独特分布,可以在轮廓表示上进行编码。然后,这些表示被用作深度模型的输入,以学习与特定活动相关的特定模式。
为了展示所提出的融合算法的泛化能力,考虑了 2 种不同的情况。这些情况在时间片段大小、活动类型、可穿戴设备、受试者和深度学习架构方面有所不同。第一种情况使用了一个数据集,其中一个参与者在佩戴 Empatica E4 腕带的情况下进行了有限数量的活动。在第二种情况下,使用了一个与日常生活活动相关的数据集,其中 10 名不同的参与者在进行更复杂的一组活动时佩戴了惯性测量单元。第二种情况下,通过留一受试者交叉验证获得的精度指标达到了 0.803。还评估了缺失数据对性能下降的影响。
总之,所提出的融合技术提供了一种可能性,可以将不同模态的联合可变性信息嵌入到单个二维表示中,从而更全面地了解手头日常人类活动的不同方面,同时保持活动识别的性能水平。