Bertsimas Dimitris, Klasnja Predrag, Murphy Susan, Na Liangyuan
Sloan School of Management Massachusetts Institute of Technology Cambridge, USA.
School of Information University of Michigan Ann Arbor, USA.
2022 IEEE Int Conf Digit Health IEEE IDCH 2022 (2022). 2022 Jul;2022:13-22. doi: 10.1109/ICDH55609.2022.00010. Epub 2022 Aug 24.
To promote healthy behaviors, many mobile health applications provide message-based interventions, such as tips, motivational messages, or suggestions for healthy activities. Ideally, the intervention policies should be carefully designed so that users obtain the benefits without being overwhelmed by overly frequent messages. As part of the HeartSteps physical-activity intervention, users receive messages intended to disrupt sedentary behavior. HeartSteps uses an algorithm to uniformly spread out the daily message budget over time, but does not attempt to maximize treatment effects. This limitation motivates constructing a policy to optimize the message delivery decisions for more effective treatments. Moreover, the learned policy needs to be interpretable to enable behavioral scientists to examine it and to inform future theorizing. We address this problem by learning an effective and interpretable policy that reduces sedentary behavior. We propose Optimal Policy Trees + (OPT+), an innovative batch off-policy learning method, that combines a personalized threshold learning and an extension of Optimal Policy Trees under a budget-constrained setting. We implement and test the method using data collected in HeartSteps V2/V3. Computational results demonstrate a significant reduction in sedentary behavior with a lower delivery budget. OPT+ produces a highly interpretable and stable output decision tree thus enabling theoretical insights to guide future research.
为促进健康行为,许多移动健康应用程序提供基于信息的干预措施,如小贴士、激励信息或健康活动建议。理想情况下,干预政策应精心设计,以便用户获得益处,同时又不会被过于频繁的信息淹没。作为“心脏步数”身体活动干预的一部分,用户会收到旨在打破久坐行为的信息。“心脏步数”使用一种算法将每日信息预算随时间均匀分配,但并未试图使治疗效果最大化。这一局限性促使构建一种政策,以优化信息传递决策,实现更有效的治疗。此外,所学习到的政策需要具有可解释性,以便行为科学家能够对其进行审视,并为未来的理论构建提供信息。我们通过学习一种减少久坐行为的有效且可解释的政策来解决这个问题。我们提出了最优政策树+(OPT+),这是一种创新的批量离策略学习方法,它在预算受限的环境下,将个性化阈值学习与最优政策树的扩展相结合。我们使用在“心脏步数”V2/V3中收集的数据来实现和测试该方法。计算结果表明,在较低的传递预算下,久坐行为显著减少。OPT+产生了一个高度可解释且稳定的输出决策树,从而能够提供理论见解来指导未来的研究。