Koffman Lily, Crainiceanu Ciprian, Leroux Andrew
Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA.
Department of Biostatistics & Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.
J R Stat Soc Ser C Appl Stat. 2024 Jul 29;73(5):1221-1241. doi: 10.1093/jrsssc/qlae033. eCollection 2024 Nov.
We consider the problem of predicting an individual's identity from accelerometry data collected during walking. In a previous paper, we transformed the accelerometry time series into an image by constructing the joint distribution of the acceleration and lagged acceleration for a vector of lags. Predictors derived by partitioning this image into grid cells were used in logistic regression to predict individuals. Here, we (a) implement machine learning methods for prediction using the grid cell-derived predictors; (b) derive inferential methods to screen for the most predictive grid cells while adjusting for correlation and multiple comparisons; and (c) develop a novel multivariate functional regression model that avoids partitioning the predictor space. Prediction methods are compared on two open source acceleometry data sets collected from: (a) 32 individuals walking on a km path; and (b) six repetitions of walking on a 20 m path on two occasions at least 1 week apart for 153 study participants. In the 32-individual study, all methods achieve at least 95% rank-1 accuracy, while in the 153-individual study, accuracy varies from 41% to 98%, depending on the method and prediction task. Methods provide insights into why some individuals are easier to predict than others.
我们考虑从步行过程中收集的加速度计数据预测个体身份的问题。在之前的一篇论文中,我们通过构建加速度与滞后加速度向量的联合分布,将加速度计时间序列转换为图像。通过将该图像划分为网格单元得出的预测变量被用于逻辑回归以预测个体。在此,我们:(a) 使用从网格单元得出的预测变量实施用于预测的机器学习方法;(b) 推导在调整相关性和多重比较的同时筛选最具预测性的网格单元的推断方法;以及 (c) 开发一种避免划分预测变量空间的新型多元函数回归模型。在从以下方面收集的两个开源加速度计数据集上比较预测方法:(a) 32 名个体在 1 公里路径上行走;以及 (b) 153 名研究参与者在至少相隔 1 周的两个场合在 20 米路径上进行的六次重复行走。在 32 名个体的研究中,所有方法均实现了至少 95% 的排名第一的准确率,而在 153 名个体的研究中,准确率从 41% 到 98% 不等,具体取决于方法和预测任务。这些方法为为何有些个体比其他个体更容易预测提供了见解。