• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

支持口腔自我护理的在线强化学习算法的奖励设计

Reward Design For An Online Reinforcement Learning Algorithm Supporting Oral Self-Care.

作者信息

Trella Anna L, Zhang Kelly W, Nahum-Shani Inbal, Shetty Vivek, Doshi-Velez Finale, Murphy Susan A

机构信息

Department of Computer Science, Harvard University.

Institute for Social Research, University of Michigan.

出版信息

Proc Innov Appl Artif Intell Conf. 2023 Jun 27;37(13):15724-15730. doi: 10.1609/aaai.v37i13.26866.

DOI:10.1609/aaai.v37i13.26866
PMID:37637073
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10457015/
Abstract

While dental disease is largely preventable, professional advice on optimal oral hygiene practices is often forgotten or abandoned by patients. Therefore patients may benefit from timely and personalized encouragement to engage in oral self-care behaviors. In this paper, we develop an online reinforcement learning (RL) algorithm for use in optimizing the delivery of mobile-based prompts to encourage oral hygiene behaviors. One of the main challenges in developing such an algorithm is ensuring that the algorithm considers the impact of current actions on the effectiveness of future actions (i.e., delayed effects), especially when the algorithm has been designed to run stably and autonomously in a constrained, real-world setting characterized by highly noisy, sparse data. We address this challenge by designing a quality reward that maximizes the desired health outcome (i.e., high-quality brushing) while minimizing user burden. We also highlight a procedure for optimizing the hyperparameters of the reward by building a simulation environment test bed and evaluating candidates using the test bed. The RL algorithm discussed in this paper will be deployed in Oralytics. To the best of our knowledge, Oralytics is the first mobile health study utilizing an RL algorithm designed to prevent dental disease by optimizing the delivery of motivational messages supporting oral self-care behaviors.

摘要

虽然牙科疾病在很大程度上是可以预防的,但患者往往会忘记或摒弃关于最佳口腔卫生习惯的专业建议。因此,患者可能会从及时且个性化的鼓励中受益,从而参与口腔自我护理行为。在本文中,我们开发了一种在线强化学习(RL)算法,用于优化基于移动设备的提示信息传递,以鼓励口腔卫生行为。开发这种算法的主要挑战之一是确保算法考虑当前行动对未来行动有效性的影响(即延迟效应),尤其是当算法被设计为在以高度嘈杂、稀疏数据为特征的受限现实环境中稳定且自主运行时。我们通过设计一种质量奖励来应对这一挑战,该奖励在最小化用户负担的同时,使期望的健康结果(即高质量刷牙)最大化。我们还强调了一种通过构建模拟环境测试平台并使用该平台评估候选参数来优化奖励超参数的程序。本文讨论的RL算法将部署在Oralytics中。据我们所知,Oralytics是第一项利用RL算法的移动健康研究,该算法旨在通过优化支持口腔自我护理行为的激励信息传递来预防牙科疾病。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/588b/10457015/8cbfc3320fd5/nihms-1851571-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/588b/10457015/8cbfc3320fd5/nihms-1851571-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/588b/10457015/8cbfc3320fd5/nihms-1851571-f0001.jpg

相似文献

1
Reward Design For An Online Reinforcement Learning Algorithm Supporting Oral Self-Care.支持口腔自我护理的在线强化学习算法的奖励设计
Proc Innov Appl Artif Intell Conf. 2023 Jun 27;37(13):15724-15730. doi: 10.1609/aaai.v37i13.26866.
2
Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-Implementation Guidelines.为数字干预设计强化学习算法:实施前指南。
Algorithms. 2022 Aug;15(8). doi: 10.3390/a15080255. Epub 2022 Jul 22.
3
Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm.基于鲸鱼优化算法优化自动驾驶中深度强化学习的超参数。
PLoS One. 2021 Jun 10;16(6):e0252754. doi: 10.1371/journal.pone.0252754. eCollection 2021.
4
Personalized HeartSteps: A Reinforcement Learning Algorithm for Optimizing Physical Activity.个性化心脏运动计划:一种用于优化身体活动的强化学习算法
Proc ACM Interact Mob Wearable Ubiquitous Technol. 2020 Mar;4(1). doi: 10.1145/3381007.
5
A reinforcement learning algorithm acquires demonstration from the training agent by dividing the task space.强化学习算法通过划分任务空间从训练代理那里获取演示。
Neural Netw. 2023 Jul;164:419-427. doi: 10.1016/j.neunet.2023.04.042. Epub 2023 May 5.
6
Optimizing an adaptive digital oral health intervention for promoting oral self-care behaviors: Micro-randomized trial protocol.优化自适应数字化口腔健康干预措施以促进口腔自我保健行为:微随机试验方案。
Contemp Clin Trials. 2024 Apr;139:107464. doi: 10.1016/j.cct.2024.107464. Epub 2024 Feb 1.
7
Optimization of news dissemination push mode by intelligent edge computing technology for deep learning.基于深度学习的智能边缘计算技术对新闻传播推送模式的优化
Sci Rep. 2024 Mar 20;14(1):6671. doi: 10.1038/s41598-024-53859-7.
8
Did we personalize? Assessing personalization by an online reinforcement learning algorithm using resampling.我们进行个性化了吗?使用重采样通过在线强化学习算法评估个性化。
Mach Learn. 2024 Jul;113(7):3961-3997. doi: 10.1007/s10994-024-06526-x. Epub 2024 Apr 10.
9
The effectiveness of internet-based e-learning on clinician behavior and patient outcomes: a systematic review protocol.基于互联网的电子学习对临床医生行为和患者结局的有效性:一项系统评价方案。
JBI Database System Rev Implement Rep. 2015 Jan;13(1):52-64. doi: 10.11124/jbisrir-2015-1919.
10
IQuaD dental trial; improving the quality of dentistry: a multicentre randomised controlled trial comparing oral hygiene advice and periodontal instrumentation for the prevention and management of periodontal disease in dentate adults attending dental primary care.IQuaD 口腔试验;改善牙科质量:一项多中心随机对照试验,比较口腔卫生保健指导和牙周器械在预防和治疗接受初级口腔保健的有牙成年人牙周病中的效果。
BMC Oral Health. 2013 Oct 26;13:58. doi: 10.1186/1472-6831-13-58.

引用本文的文献

1
Reinforcement Learning on Dyads to Enhance Medication Adherence.二元组强化学习以提高药物依从性。
Artif Intell Med Conf Artif Intell Med (2005-). 2025 Jun;15734:490-499. doi: 10.1007/978-3-031-95838-0_48. Epub 2025 Jun 23.
2
Designing digital health interventions with causal inference and multi-armed bandits: a review.运用因果推断和多臂老虎机设计数字健康干预措施:综述
Front Digit Health. 2025 Jun 5;7:1435917. doi: 10.3389/fdgth.2025.1435917. eCollection 2025.
3
A Deployed Online Reinforcement Learning Algorithm In An Oral Health Clinical Trial.一种应用于口腔健康临床试验的在线强化学习算法
Proc AAAI Conf Artif Intell. 2025;39(28):28792-28800. doi: 10.1609/aaai.v39i28.35143. Epub 2025 Apr 11.
4
Exploring parental opinions on oral hygiene behavior and knowledge of their young children in Lithuania: a cross-sectional survey study.探索立陶宛父母对其幼儿口腔卫生行为和知识的看法:一项横断面调查研究。
Front Oral Health. 2025 Apr 29;6:1530265. doi: 10.3389/froh.2025.1530265. eCollection 2025.
5
ReBandit: Random Effects Based Online RL Algorithm for Reducing Cannabis Use.ReBandit:基于随机效应的在线强化学习算法用于减少大麻使用
IJCAI (U S). 2024 Aug;2024:7278-7286.
6
Targeting Key Risk Factors for Cardiovascular Disease in At-Risk Individuals: Developing a Digital, Personalized, and Real-Time Intervention to Facilitate Smoking Cessation and Physical Activity.针对高危个体的心血管疾病关键风险因素:开发一种数字化、个性化和实时干预措施以促进戒烟和身体活动。
JMIR Cardio. 2024 Dec 20;8:e47730. doi: 10.2196/47730.
7
Online learning in bandits with predicted context.带预测上下文的在线学习中的博弈问题
Proc Mach Learn Res. 2024 May;238:2215-2223.
8
Optimizing an adaptive digital oral health intervention for promoting oral self-care behaviors: Micro-randomized trial protocol.优化自适应数字化口腔健康干预措施以促进口腔自我保健行为:微随机试验方案。
Contemp Clin Trials. 2024 Apr;139:107464. doi: 10.1016/j.cct.2024.107464. Epub 2024 Feb 1.

本文引用的文献

1
Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-Implementation Guidelines.为数字干预设计强化学习算法:实施前指南。
Algorithms. 2022 Aug;15(8). doi: 10.3390/a15080255. Epub 2022 Jul 22.
2
Patient-Centered Pain Care Using Artificial Intelligence and Mobile Health Tools: A Randomized Comparative Effectiveness Trial.基于人工智能和移动医疗工具的以患者为中心的疼痛管理:一项随机对照有效性试验。
JAMA Intern Med. 2022 Sep 1;182(9):975-983. doi: 10.1001/jamainternmed.2022.3178.
3
Challenges in Participant Engagement and Retention Using Mobile Health Apps: Literature Review.使用移动健康应用程序在参与者参与度和留存率方面面临的挑战:文献综述
J Med Internet Res. 2022 Apr 26;24(4):e35120. doi: 10.2196/35120.
4
Inference for Batched Bandits.批量策略博弈的推断
Adv Neural Inf Process Syst. 2020 Dec;33:9818-9829.
5
Power Constrained Bandits.功率受限的强盗算法
Proc Mach Learn Res. 2021 Aug;149:209-259.
6
A comparison of zero-inflated and hurdle models for modeling zero-inflated count data.用于对零膨胀计数数据进行建模的零膨胀模型和障碍模型的比较。
J Stat Distrib Appl. 2021;8(1):8. doi: 10.1186/s40488-021-00121-4. Epub 2021 Jun 24.
7
Personalized HeartSteps: A Reinforcement Learning Algorithm for Optimizing Physical Activity.个性化心脏运动计划:一种用于优化身体活动的强化学习算法
Proc ACM Interact Mob Wearable Ubiquitous Technol. 2020 Mar;4(1). doi: 10.1145/3381007.
8
Reinforcement Learning to Send Reminders at Right Moments in Smartphone Exercise Application: A Feasibility Study.强化学习在智能手机锻炼应用程序中适时发送提醒的可行性研究。
Int J Environ Res Public Health. 2021 Jun 4;18(11):6059. doi: 10.3390/ijerph18116059.
9
Adaptive learning algorithms to optimize mobile applications for behavioral health: guidelines for design decisions.自适应学习算法优化行为健康移动应用程序:设计决策指南。
J Am Med Inform Assoc. 2021 Jun 12;28(6):1225-1234. doi: 10.1093/jamia/ocab001.
10
Rates of Attrition and Dropout in App-Based Interventions for Chronic Disease: Systematic Review and Meta-Analysis.基于应用程序的慢性病干预措施的脱落率和辍学率:系统评价和荟萃分析。
J Med Internet Res. 2020 Sep 29;22(9):e20283. doi: 10.2196/20283.