Herbert Wertheim School of Public Health and Human Longevity Science, University of California, San Diego, La Jolla, CA, United States.
Center for Wireless & Population Health Systems, Calit2's Qualcomm Institute, University of California, San Diego, La Jolla, CA, United States.
JMIR Mhealth Uhealth. 2023 Jan 27;11:e44296. doi: 10.2196/44296.
Physical inactivity is associated with numerous health risks, including cancer, cardiovascular disease, type 2 diabetes, increased health care expenditure, and preventable, premature deaths. The majority of Americans fall short of clinical guideline goals (ie, 8000-10,000 steps per day). Behavior prediction algorithms could enable efficacious interventions to promote physical activity by facilitating delivery of nudges at appropriate times.
The aim of this paper is to develop and validate algorithms that predict walking (ie, >5 min) within the next 3 hours, predicted from the participants' previous 5 weeks' steps-per-minute data.
We conducted a retrospective, closed cohort, secondary analysis of a 6-week microrandomized trial of the HeartSteps mobile health physical-activity intervention conducted in 2015. The prediction performance of 6 algorithms was evaluated, as follows: logistic regression, radial-basis function support vector machine, eXtreme Gradient Boosting (XGBoost), multilayered perceptron (MLP), decision tree, and random forest. For the MLP, 90 random layer architectures were tested for optimization. Prior 5-week hourly walking data, including missingness, were used for predictors. Whether the participant walked during the next 3 hours was used as the outcome. K-fold cross-validation (K=10) was used for the internal validation. The primary outcome measures are classification accuracy, the Mathew correlation coefficient, sensitivity, and specificity.
The total sample size included 6 weeks of data among 44 participants. Of the 44 participants, 31 (71%) were female, 26 (59%) were White, 36 (82%) had a college degree or more, and 15 (34%) were married. The mean age was 35.9 (SD 14.7) years. Participants (n=3, 7%) who did not have enough data (number of days <10) were excluded, resulting in 41 (93%) participants. MLP with optimized layer architecture showed the best performance in accuracy (82.0%, SD 1.1), whereas XGBoost (76.3%, SD 1.5), random forest (69.5%, SD 1.0), support vector machine (69.3%, SD 1.0), and decision tree (63.6%, SD 1.5) algorithms showed lower performance than logistic regression (77.2%, SD 1.2). MLP also showed superior overall performance to all other tried algorithms in Mathew correlation coefficient (0.643, SD 0.021), sensitivity (86.1%, SD 3.0), and specificity (77.8%, SD 3.3).
Walking behavior prediction models were developed and validated. MLP showed the highest overall performance of all attempted algorithms. A random search for optimal layer structure is a promising approach for prediction engine development. Future studies can test the real-world application of this algorithm in a "smart" intervention for promoting physical activity.
身体活动不足与许多健康风险相关,包括癌症、心血管疾病、2 型糖尿病、增加医疗保健支出和可预防的过早死亡。大多数美国人未能达到临床指南目标(即每天 8000-10000 步)。行为预测算法可以通过在适当的时间提供提示来促进身体活动的有效干预。
本文旨在开发和验证可在未来 3 小时内预测步行(即>5 分钟)的算法,预测依据是参与者过去 5 周的每分钟步数数据。
我们对 2015 年进行的 HeartSteps 移动健康身体活动干预的 6 周微观随机试验进行了回顾性、封闭队列、二次分析。评估了 6 种算法的预测性能,如下所示:逻辑回归、径向基函数支持向量机、极端梯度提升(XGBoost)、多层感知机(MLP)、决策树和随机森林。对于 MLP,测试了 90 个随机层结构以进行优化。使用前 5 周每小时步行数据(包括缺失数据)作为预测因子。参与者在接下来的 3 小时内是否步行作为结果。使用 K 折交叉验证(K=10)进行内部验证。主要结果指标是分类准确性、马修相关系数、灵敏度和特异性。
总样本量包括 44 名参与者的 6 周数据。在 44 名参与者中,31 名(71%)为女性,26 名(59%)为白人,36 名(82%)拥有大学学历或以上,15 名(34%)已婚。平均年龄为 35.9(SD 14.7)岁。排除了因数据不足(天数<10)而未纳入的 3 名参与者(n=3,7%),共纳入 41 名(93%)参与者。具有优化层结构的 MLP 在准确性(82.0%,SD 1.1)方面表现最佳,而 XGBoost(76.3%,SD 1.5)、随机森林(69.5%,SD 1.0)、支持向量机(69.3%,SD 1.0)和决策树(63.6%,SD 1.5)算法的性能低于逻辑回归(77.2%,SD 1.2)。MLP 在马修相关系数(0.643,SD 0.021)、灵敏度(86.1%,SD 3.0)和特异性(77.8%,SD 3.3)方面的整体表现也优于所有其他尝试的算法。
已经开发和验证了步行行为预测模型。MLP 在所有尝试的算法中表现出最高的整体性能。随机搜索最佳层结构是开发预测引擎的一种很有前途的方法。未来的研究可以测试该算法在促进身体活动的“智能”干预中的实际应用。