Program in Public Health, University of California, Irvine, USA.
Environ Health. 2011 Nov 14;10:101. doi: 10.1186/1476-069X-10-101.
Air pollution epidemiological studies are increasingly using global positioning system (GPS) to collect time-location data because they offer continuous tracking, high temporal resolution, and minimum reporting burden for participants. However, substantial uncertainties in the processing and classifying of raw GPS data create challenges for reliably characterizing time activity patterns. We developed and evaluated models to classify people's major time activity patterns from continuous GPS tracking data.
We developed and evaluated two automated models to classify major time activity patterns (i.e., indoor, outdoor static, outdoor walking, and in-vehicle travel) based on GPS time activity data collected under free living conditions for 47 participants (N = 131 person-days) from the Harbor Communities Time Location Study (HCTLS) in 2008 and supplemental GPS data collected from three UC-Irvine research staff (N = 21 person-days) in 2010. Time activity patterns used for model development were manually classified by research staff using information from participant GPS recordings, activity logs, and follow-up interviews. We evaluated two models: (a) a rule-based model that developed user-defined rules based on time, speed, and spatial location, and (b) a random forest decision tree model.
Indoor, outdoor static, outdoor walking and in-vehicle travel activities accounted for 82.7%, 6.1%, 3.2% and 7.2% of manually-classified time activities in the HCTLS dataset, respectively. The rule-based model classified indoor and in-vehicle travel periods reasonably well (Indoor: sensitivity > 91%, specificity > 80%, and precision > 96%; in-vehicle travel: sensitivity > 71%, specificity > 99%, and precision > 88%), but the performance was moderate for outdoor static and outdoor walking predictions. No striking differences in performance were observed between the rule-based and the random forest models. The random forest model was fast and easy to execute, but was likely less robust than the rule-based model under the condition of biased or poor quality training data.
Our models can successfully identify indoor and in-vehicle travel points from the raw GPS data, but challenges remain in developing models to distinguish outdoor static points and walking. Accurate training data are essential in developing reliable models in classifying time-activity patterns.
空气污染的流行病学研究越来越多地使用全球定位系统(GPS)来收集时间-地点数据,因为它们为参与者提供了连续的跟踪、高时间分辨率和最小的报告负担。然而,原始 GPS 数据的处理和分类存在很大的不确定性,这给可靠地描述时间活动模式带来了挑战。我们开发并评估了模型,以从连续 GPS 跟踪数据中分类人员的主要时间活动模式。
我们开发并评估了两种自动化模型,以根据 2008 年 Harbor Communities Time Location Study (HCTLS) 中 47 名参与者(N=131 人天)在自由生活条件下收集的 GPS 时间活动数据和 2010 年三名加州大学欧文分校研究人员的补充 GPS 数据(N=21 人天),基于 GPS 时间活动数据,分类主要时间活动模式(即室内、室外静态、室外步行和车内旅行)。用于模型开发的时间活动模式由研究人员使用参与者 GPS 记录、活动日志和后续访谈中的信息手动分类。我们评估了两种模型:(a)基于时间、速度和空间位置制定用户定义规则的规则模型;(b)随机森林决策树模型。
室内、室外静态、室外步行和车内旅行活动分别占 HCTLS 数据集手动分类时间活动的 82.7%、6.1%、3.2%和 7.2%。基于规则的模型对室内和车内旅行时段的分类效果较好(室内:敏感性>91%,特异性>80%,精度>96%;车内旅行:敏感性>71%,特异性>99%,精度>88%),但对室外静态和室外步行的预测效果中等。规则模型和随机森林模型的性能没有明显差异。随机森林模型快速且易于执行,但在训练数据存在偏差或质量较差的情况下,其稳健性可能低于基于规则的模型。
我们的模型可以从原始 GPS 数据中成功识别室内和车内旅行点,但在开发用于区分室外静态点和步行的模型方面仍存在挑战。准确的训练数据对于开发可靠的时间活动模式分类模型至关重要。