Ghosh Susobhan, Guo Yongyi, Hung Pei-Yao, Coughlin Lara, Bonar Erin, Nahum-Shani Inbal, Walton Maureen, Murphy Susan
Department of Computer Science, Harvard University.
Department of Statistics, University of Wisconsin-Madison.
IJCAI (U S). 2024 Aug;2024:7278-7286.
The escalating prevalence of cannabis use, and associated cannabis-use disorder (CUD), poses a significant public health challenge globally. With a notably wide treatment gap, especially among emerging adults (EAs; ages 18-25), addressing cannabis use and CUD remains a pivotal objective within the 2030 United Nations Agenda for Sustainable Development Goals (SDG). In this work, we develop an online reinforcement learning (RL) algorithm called reBandit which will be utilized in a mobile health study to deliver personalized mobile health interventions aimed at reducing cannabis use among EAs. reBandit utilizes and to learn quickly and efficiently in noisy mobile health environments. Moreover, reBandit employs Empirical Bayes and optimization techniques to autonomously update its hyper-parameters online. To evaluate the performance of our algorithm, we construct a simulation testbed using data from a prior study, and compare against commonly used algorithms in mobile health studies. We show that reBandit performs equally well or better than all the baseline algorithms, and the performance gap widens as population heterogeneity increases in the simulation environment, proving its adeptness to adapt to diverse population of study participants.
大麻使用及相关大麻使用障碍(CUD)的患病率不断上升,这在全球范围内构成了重大的公共卫生挑战。由于存在明显的治疗差距,尤其是在新兴成年人(18至25岁)中,应对大麻使用和CUD仍然是《2030年联合国可持续发展目标议程》(SDG)中的一个关键目标。在这项工作中,我们开发了一种名为reBandit的在线强化学习(RL)算法,该算法将用于一项移动健康研究,以提供旨在减少新兴成年人中大麻使用的个性化移动健康干预措施。reBandit利用[具体内容缺失]在嘈杂的移动健康环境中快速高效地学习。此外,reBandit采用经验贝叶斯和优化技术在线自主更新其超参数。为了评估我们算法的性能,我们使用先前一项研究的数据构建了一个模拟测试平台,并与移动健康研究中常用的算法进行比较。我们表明,reBandit的表现与所有基线算法相当或更好,并且随着模拟环境中人群异质性的增加,性能差距会扩大,这证明了它能够适应不同的研究参与者群体。