Trella Anna L, Zhang Kelly W, Nahum-Shani Inbal, Shetty Vivek, Doshi-Velez Finale, Murphy Susan A
School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02420, USA.
Institute for Social Research, University of Michigan, Ann Arbor, MI 48109, USA.
Algorithms. 2022 Aug;15(8). doi: 10.3390/a15080255. Epub 2022 Jul 22.
Online reinforcement learning (RL) algorithms are increasingly used to personalize digital interventions in the fields of mobile health and online education. Common challenges in designing and testing an RL algorithm in these settings include ensuring the RL algorithm can learn and run stably under real-time constraints, and accounting for the complexity of the environment, e.g., a lack of accurate mechanistic models for the user dynamics. To guide how one can tackle these challenges, we extend the PCS (predictability, computability, stability) framework, a data science framework that incorporates best practices from machine learning and statistics in supervised learning to the design of RL algorithms for the digital interventions setting. Furthermore, we provide guidelines on how to design simulation environments, a crucial tool for evaluating RL candidate algorithms using the PCS framework. We show how we used the PCS framework to design an RL algorithm for Oralytics, a mobile health study aiming to improve users' tooth-brushing behaviors through the personalized delivery of intervention messages. Oralytics will go into the field in late 2022.
在线强化学习(RL)算法越来越多地用于移动健康和在线教育领域的数字干预个性化。在这些环境中设计和测试RL算法的常见挑战包括确保RL算法能够在实时约束下稳定学习和运行,以及考虑环境的复杂性,例如缺乏用于用户动态的准确机制模型。为了指导如何应对这些挑战,我们扩展了PCS(可预测性、可计算性、稳定性)框架,这是一个数据科学框架,将监督学习中机器学习和统计学的最佳实践纳入数字干预设置的RL算法设计中。此外,我们提供了关于如何设计模拟环境的指南,模拟环境是使用PCS框架评估RL候选算法的关键工具。我们展示了如何使用PCS框架为Oralytics设计RL算法,Oralytics是一项移动健康研究,旨在通过个性化发送干预信息来改善用户的刷牙行为。Oralytics将于2022年底投入使用。