为数字干预设计强化学习算法：实施前指南。

Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-Implementation Guidelines.

作者信息

Trella Anna L, Zhang Kelly W, Nahum-Shani Inbal, Shetty Vivek, Doshi-Velez Finale, Murphy Susan A

机构信息

School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02420, USA.

Institute for Social Research, University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

Algorithms. 2022 Aug;15(8). doi: 10.3390/a15080255. Epub 2022 Jul 22.

DOI:10.3390/a15080255

PMID:36713810

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9881427/

Abstract

Online reinforcement learning (RL) algorithms are increasingly used to personalize digital interventions in the fields of mobile health and online education. Common challenges in designing and testing an RL algorithm in these settings include ensuring the RL algorithm can learn and run stably under real-time constraints, and accounting for the complexity of the environment, e.g., a lack of accurate mechanistic models for the user dynamics. To guide how one can tackle these challenges, we extend the PCS (predictability, computability, stability) framework, a data science framework that incorporates best practices from machine learning and statistics in supervised learning to the design of RL algorithms for the digital interventions setting. Furthermore, we provide guidelines on how to design simulation environments, a crucial tool for evaluating RL candidate algorithms using the PCS framework. We show how we used the PCS framework to design an RL algorithm for Oralytics, a mobile health study aiming to improve users' tooth-brushing behaviors through the personalized delivery of intervention messages. Oralytics will go into the field in late 2022.

摘要

在线强化学习（RL）算法越来越多地用于移动健康和在线教育领域的数字干预个性化。在这些环境中设计和测试RL算法的常见挑战包括确保RL算法能够在实时约束下稳定学习和运行，以及考虑环境的复杂性，例如缺乏用于用户动态的准确机制模型。为了指导如何应对这些挑战，我们扩展了PCS（可预测性、可计算性、稳定性）框架，这是一个数据科学框架，将监督学习中机器学习和统计学的最佳实践纳入数字干预设置的RL算法设计中。此外，我们提供了关于如何设计模拟环境的指南，模拟环境是使用PCS框架评估RL候选算法的关键工具。我们展示了如何使用PCS框架为Oralytics设计RL算法，Oralytics是一项移动健康研究，旨在通过个性化发送干预信息来改善用户的刷牙行为。Oralytics将于2022年底投入使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/182c/9881427/ebbe818b9bb4/nihms-1825651-f0004.jpg

相似文献

Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-Implementation Guidelines.为数字干预设计强化学习算法：实施前指南。

Algorithms. 2022 Aug;15(8). doi: 10.3390/a15080255. Epub 2022 Jul 22.

Reward Design For An Online Reinforcement Learning Algorithm Supporting Oral Self-Care.支持口腔自我护理的在线强化学习算法的奖励设计

Proc Innov Appl Artif Intell Conf. 2023 Jun 27;37(13):15724-15730. doi: 10.1609/aaai.v37i13.26866.

Did we personalize? Assessing personalization by an online reinforcement learning algorithm using resampling.我们进行个性化了吗？使用重采样通过在线强化学习算法评估个性化。

Mach Learn. 2024 Jul;113(7):3961-3997. doi: 10.1007/s10994-024-06526-x. Epub 2024 Apr 10.

Personalized HeartSteps: A Reinforcement Learning Algorithm for Optimizing Physical Activity.个性化心脏运动计划：一种用于优化身体活动的强化学习算法

Proc ACM Interact Mob Wearable Ubiquitous Technol. 2020 Mar;4(1). doi: 10.1145/3381007.

Adaptive learning algorithms to optimize mobile applications for behavioral health: guidelines for design decisions.自适应学习算法优化行为健康移动应用程序：设计决策指南。

J Am Med Inform Assoc. 2021 Jun 12;28(6):1225-1234. doi: 10.1093/jamia/ocab001.

Reinforcement learning for closed-loop regulation of cardiovascular system with vagus nerve stimulation: a computational study.基于迷走神经刺激的心血管系统闭环调节的强化学习：一项计算研究。

J Neural Eng. 2024 Jun 3;21(3):036027. doi: 10.1088/1741-2552/ad48bb.

A reinforcement learning model to inform optimal decision paths for HIV elimination.一种用于为消除艾滋病病毒提供最佳决策路径的强化学习模型。

Math Biosci Eng. 2021 Sep 6;18(6):7666-7684. doi: 10.3934/mbe.2021380.

Reinforcement learning application in diabetes blood glucose control: A systematic review.强化学习在糖尿病血糖控制中的应用：一项系统综述。

Artif Intell Med. 2020 Apr;104:101836. doi: 10.1016/j.artmed.2020.101836. Epub 2020 Feb 21.

SofaGym: An Open Platform for Reinforcement Learning Based on Soft Robot Simulations.SofaGym：基于软体机器人仿真的强化学习开放平台。

Soft Robot. 2023 Apr;10(2):410-430. doi: 10.1089/soro.2021.0123. Epub 2022 Dec 7.

A Hybrid Online Off-Policy Reinforcement Learning Agent Framework Supported by Transformers.基于 Transformer 的混合在线非策略强化学习代理框架。

Int J Neural Syst. 2023 Dec;33(12):2350065. doi: 10.1142/S012906572350065X. Epub 2023 Oct 20.

引用本文的文献

Reinforcement Learning on Dyads to Enhance Medication Adherence.二元组强化学习以提高药物依从性。

Artif Intell Med Conf Artif Intell Med (2005-). 2025 Jun;15734:490-499. doi: 10.1007/978-3-031-95838-0_48. Epub 2025 Jun 23.

Personalized game-based digital intervention for relieving depression and anxiety symptoms: a pilot RCT.基于游戏的个性化数字干预缓解抑郁和焦虑症状：一项随机对照试验试点研究

Npj Ment Health Res. 2025 Jul 3;4(1):27. doi: 10.1038/s44184-025-00141-x.

Exploring parental opinions on oral hygiene behavior and knowledge of their young children in Lithuania: a cross-sectional survey study.探索立陶宛父母对其幼儿口腔卫生行为和知识的看法：一项横断面调查研究。

Front Oral Health. 2025 Apr 29;6:1530265. doi: 10.3389/froh.2025.1530265. eCollection 2025.

Rethinking Discount Regularization: New Interpretations, Unintended Consequences, and Solutions for Regularization in Reinforcement Learning.重新思考折扣正则化：强化学习中正则化的新解释、意外后果及解决方案

J Mach Learn Res. 2024;25.

ReBandit: Random Effects Based Online RL Algorithm for Reducing Cannabis Use.ReBandit：基于随机效应的在线强化学习算法用于减少大麻使用

IJCAI (U S). 2024 Aug;2024:7278-7286.

Contextual Bandits with Budgeted Information Reveal.具有预算信息披露的上下文博弈

Proc Mach Learn Res. 2024 May;238:3970-3978.

Reinforcement Learning Interventions on Boundedly Rational Human Agents in Frictionful Tasks.摩擦任务中对有限理性人类智能体的强化学习干预

Proc Int Joint Conf Auton Agents Multiagent Syst. 2024 May;2024:1482-1491. Epub 2024 May 6.

Online model selection by learning how compositional kernels evolve.通过学习组合核如何演变进行在线模型选择。

Transact Mach Learn Res. 2023 Nov;2023.

The impact of using reinforcement learning to personalize communication on medication adherence: findings from the REINFORCE trial.使用强化学习实现个性化沟通对药物依从性的影响：REINFORCE试验的结果

NPJ Digit Med. 2024 Feb 19;7(1):39. doi: 10.1038/s41746-024-01028-5.

Optimizing an adaptive digital oral health intervention for promoting oral self-care behaviors: Micro-randomized trial protocol.优化自适应数字化口腔健康干预措施以促进口腔自我保健行为：微随机试验方案。

Contemp Clin Trials. 2024 Apr;139:107464. doi: 10.1016/j.cct.2024.107464. Epub 2024 Feb 1.

本文引用的文献

Power Constrained Bandits.功率受限的强盗算法

Proc Mach Learn Res. 2021 Aug;149:209-259.

A comparison of zero-inflated and hurdle models for modeling zero-inflated count data.用于对零膨胀计数数据进行建模的零膨胀模型和障碍模型的比较。

J Stat Distrib Appl. 2021;8(1):8. doi: 10.1186/s40488-021-00121-4. Epub 2021 Jun 24.

Optimizing Adaptive Notifications in Mobile Health Interventions Systems: Reinforcement Learning from a Data-driven Behavioral Simulator.优化移动健康干预系统中的自适应通知：基于数据驱动行为模拟器的强化学习。

J Med Syst. 2021 Oct 18;45(12):102. doi: 10.1007/s10916-021-01773-0.

IntelligentPooling: Practical Thompson Sampling for mHealth.智能池化：移动健康领域实用的汤普森采样法

Mach Learn. 2021 Sep;110(9):2685-2727. doi: 10.1007/s10994-021-05995-8. Epub 2021 Jun 21.

Personalized HeartSteps: A Reinforcement Learning Algorithm for Optimizing Physical Activity.个性化心脏运动计划：一种用于优化身体活动的强化学习算法

Proc ACM Interact Mob Wearable Ubiquitous Technol. 2020 Mar;4(1). doi: 10.1145/3381007.

Confidence intervals for policy evaluation in adaptive experiments.自适应试验中政策评估的置信区间。

Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2014602118.

Adaptive learning algorithms to optimize mobile applications for behavioral health: guidelines for design decisions.自适应学习算法优化行为健康移动应用程序：设计决策指南。

J Am Med Inform Assoc. 2021 Jun 12;28(6):1225-1234. doi: 10.1093/jamia/ocab001.

A Scalable System for Passively Monitoring Oral Health Behaviors Using Electronic Toothbrushes in the Home Setting: Development and Feasibility Study.一种在家中使用电子牙刷被动监测口腔健康行为的可扩展系统：开发和可行性研究。

JMIR Mhealth Uhealth. 2020 Jun 24;8(6):e17347. doi: 10.2196/17347.

Veridical data science.真实数据科学。

Proc Natl Acad Sci U S A. 2020 Feb 25;117(8):3920-3929. doi: 10.1073/pnas.1901326117. Epub 2020 Feb 13.

Educational Note: Paradoxical collider effect in the analysis of non-communicable disease epidemiological data: a reproducible illustration and web application.教育注释：分析非传染性疾病流行病学数据中的矛盾碰撞效应：可重复再现的说明和网络应用。

Int J Epidemiol. 2019 Apr 1;48(2):640-653. doi: 10.1093/ije/dyy275.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

为数字干预设计强化学习算法：实施前指南。

Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-Implementation Guidelines.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献