Netherlands eScience Center, Amsterdam, The Netherlands.
Centre for Longitudinal Studies, UCL Institute of Education, London, United Kingdom.
PLoS One. 2019 Jan 9;14(1):e0208692. doi: 10.1371/journal.pone.0208692. eCollection 2019.
Accelerometers are increasingly used to obtain valuable descriptors of physical activity for health research. The cut-points approach to segment accelerometer data is widely used in physical activity research but requires resource expensive calibration studies and does not make it easy to explore the information that can be gained for a variety of raw data metrics. To address these limitations, we present a data-driven approach for segmenting and clustering the accelerometer data using unsupervised machine learning.
The data used came from five hundred fourteen-year-old participants from the Millennium cohort study who wore an accelerometer (GENEActiv) on their wrist on one weekday and one weekend day. A Hidden Semi-Markov Model (HSMM), configured to identify a maximum of ten behavioral states from five second averaged acceleration with and without addition of x, y, and z-angles, was used for segmenting and clustering of the data. A cut-points approach was used as comparison.
Time spent in behavioral states with or without angle metrics constituted eight and five principal components to reach 95% explained variance, respectively; in comparison four components were identified with the cut-points approach. In the HSMM with acceleration and angle as input, the distributions for acceleration in the states showed similar groupings as the cut-points categories, while more variety was seen in the distribution of angles.
Our unsupervised classification approach learns a construct of human behavior based on the data it observes, without the need for resource expensive calibration studies, has the ability to combine multiple data metrics, and offers a higher dimensional description of physical behavior. States are interpretable from the distributions of observations and by their duration.
加速度计越来越多地用于获取对健康研究有价值的身体活动描述符。加速度计数据的切点方法广泛应用于体力活动研究,但需要资源昂贵的校准研究,并且不容易探索可以从各种原始数据指标中获得的信息。为了解决这些限制,我们提出了一种使用无监督机器学习对加速度计数据进行分段和聚类的方法。
使用的数据来自于参加千禧年队列研究的五百一十四岁的参与者,他们在一个工作日和一个周末在手腕上佩戴了一个加速度计(GENEActiv)。使用 Hidden Semi-Markov Model (HSMM),配置为使用和不使用 x、y 和 z 角度的五秒平均加速度识别最多十个行为状态,用于对数据进行分段和聚类。使用切点方法作为比较。
有或没有角度指标的行为状态所花费的时间分别构成了八个和五个主成分,以达到 95%的解释方差;相比之下,切点方法确定了四个成分。在使用加速度和角度作为输入的 HSMM 中,状态下的加速度分布显示出与切点类别相似的分组,而角度分布则显示出更多的多样性。
我们的无监督分类方法基于它观察到的数据学习人类行为的结构,而不需要资源昂贵的校准研究,具有结合多种数据指标的能力,并提供了对物理行为的更高维描述。可以通过观察的分布和持续时间来解释状态。