Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, USA.
Department of Biostatistics, Harvard University, 677 Huntington Avenue, Boston, MA, USA.
Biostatistics. 2020 Apr 1;21(2):e98-e112. doi: 10.1093/biostatistics/kxy059.
With increasing availability of smartphones with Global Positioning System (GPS) capabilities, large-scale studies relating individual-level mobility patterns to a wide variety of patient-centered outcomes, from mood disorders to surgical recovery, are becoming a reality. Similar past studies have been small in scale and have provided wearable GPS devices to subjects. These devices typically collect mobility traces continuously without significant gaps in the data, and consequently the problem of data missingness has been safely ignored. Leveraging subjects' own smartphones makes it possible to scale up and extend the duration of these types of studies, but at the same time introduces a substantial challenge: to preserve a smartphone's battery, GPS can be active only for a small portion of the time, frequently less than $10%$, leading to a tremendous missing data problem. We introduce a principled statistical approach, based on weighted resampling of the observed data, to impute the missing mobility traces, which we then summarize using different mobility measures. We compare the strengths of our approach to linear interpolation (LI), a popular approach for dealing with missing data, both analytically and through simulation of missingness for empirical data. We conclude that our imputation approach better mirrors human mobility both theoretically and over a sample of GPS mobility traces from 182 individuals in the Geolife data set, where, relative to LI, imputation resulted in a 10-fold reduction in the error averaged across all mobility features.
随着具有全球定位系统 (GPS) 功能的智能手机的日益普及,将个体移动模式与各种以患者为中心的结果(从情绪障碍到手术恢复)相关联的大规模研究成为现实。类似的过去研究规模较小,并向研究对象提供可穿戴 GPS 设备。这些设备通常连续不断地收集移动轨迹,数据中没有明显的空白,因此数据缺失问题被安全地忽略了。利用研究对象自己的智能手机可以扩大这些类型研究的规模并延长研究时间,但同时也带来了一个重大挑战:为了节省智能手机的电池电量,GPS 只能在一小部分时间内保持活动状态,通常不到 10%,从而导致大量数据缺失。我们引入了一种基于对观测数据进行加权重采样的有原则的统计方法来估算缺失的移动轨迹,然后使用不同的移动度量标准对其进行总结。我们通过分析和对真实数据缺失的模拟,将我们的方法与线性插值 (LI) 进行比较,LI 是处理缺失数据的一种流行方法。我们的结论是,我们的估算方法在理论上和在来自 Geolife 数据集的 182 个人的 GPS 移动轨迹样本中都更好地反映了人类的移动模式,与 LI 相比,估算方法在所有移动特征的平均误差方面降低了 10 倍。