• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于评估移动健康干预措施的稳健混合效应强化学习算法

A Robust Mixed-Effects Bandit Algorithm for Assessing Mobile Health Interventions.

作者信息

Huch Easton K, Shi Jieru, Abbott Madeline R, Golbus Jessica R, Moreno Alexander, Dempsey Walter H

机构信息

Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA.

Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

Adv Neural Inf Process Syst. 2024;37:128280-128329.

PMID:40895488
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12395203/
Abstract

Mobile health leverages personalized, contextually-tailored interventions optimized through bandit and reinforcement learning algorithms. Despite its promise, challenges like participant heterogeneity, nonstationarity, and nonlinearity in rewards hinder algorithm performance. We propose a robust contextual bandit algorithm, termed "DML-TS-NNR", that simultaneously addresses these challenges via (1) modeling the differential reward with user- and time-specific incidental parameters, (2) network cohesion penalties, and (3) debiased machine learning for flexible estimation of baseline rewards. We establish a high-probability regret bound that depends solely on the dimension of the differential reward model. This feature enables us to achieve robust regret bounds even when the baseline reward is highly complex. We demonstrate the superior performance of the DML-TS-NNR algorithm in a simulation and two off-policy evaluation studies.

摘要

移动健康利用通过强盗算法和强化学习算法优化的个性化、情境定制干预措施。尽管它前景广阔,但参与者异质性、非平稳性和奖励非线性等挑战阻碍了算法性能。我们提出了一种稳健的情境强盗算法,称为“DML-TS-NNR”,该算法通过以下方式同时应对这些挑战:(1)使用特定于用户和时间的附带参数对差异奖励进行建模;(2)网络凝聚惩罚;(3)用于灵活估计基线奖励的去偏机器学习。我们建立了一个仅依赖于差异奖励模型维度的高概率遗憾界。这一特性使我们即使在基线奖励非常复杂的情况下也能实现稳健的遗憾界。我们在一项模拟和两项离策略评估研究中展示了DML-TS-NNR算法的卓越性能。

相似文献

1
A Robust Mixed-Effects Bandit Algorithm for Assessing Mobile Health Interventions.一种用于评估移动健康干预措施的稳健混合效应强化学习算法
Adv Neural Inf Process Syst. 2024;37:128280-128329.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Development of Machine Learning-based Algorithms to Predict the 2- and 5-year Risk of TKA After Tibial Plateau Fracture Treatment.基于机器学习的算法用于预测胫骨平台骨折治疗后2年和5年全膝关节置换风险的研究进展
Clin Orthop Relat Res. 2025 Mar 12. doi: 10.1097/CORR.0000000000003442.
4
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能?开发一种互联网应用算法。
Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.
5
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
6
Q-learning with temporal memory to navigate turbulence.基于时间记忆的Q学习以应对动荡。
Elife. 2025 Jul 21;13:RP102906. doi: 10.7554/eLife.102906.
7
Mobile phone-based interventions for improving adherence to medication prescribed for the primary prevention of cardiovascular disease in adults.基于手机的干预措施,用于提高成年人心血管疾病一级预防中所开药物的依从性。
Cochrane Database Syst Rev. 2018 Jun 22;6(6):CD012675. doi: 10.1002/14651858.CD012675.pub2.
8
Improving Energy Access, Climate and Socio-Economic Outcomes Through Off-Grid Electrification Technologies: A Systematic Review.通过离网电气化技术改善能源获取、气候和社会经济成果:一项系统综述。
Campbell Syst Rev. 2025 Aug 15;21(3):e70060. doi: 10.1002/cl2.70060. eCollection 2025 Sep.
9
Predictive modeling of complications arising from early-onset preeclampsia in pregnant women.早发型子痫前期孕妇并发症的预测模型
Womens Health (Lond). 2025 Jan-Dec;21:17455057251348978. doi: 10.1177/17455057251348978. Epub 2025 Jul 21.
10
Incentives for smoking cessation.戒烟的激励措施。
Cochrane Database Syst Rev. 2025 Jan 13;1(1):CD004307. doi: 10.1002/14651858.CD004307.pub7.

本文引用的文献

1
A randomized trial of a mobile health intervention to augment cardiac rehabilitation.一项用于增强心脏康复效果的移动健康干预随机试验。
NPJ Digit Med. 2023 Sep 14;6(1):173. doi: 10.1038/s41746-023-00921-9.
2
Virtual AppLication-supported Environment To INcrease Exercise (VALENTINE) during cardiac rehabilitation study: Rationale and design.心脏康复研究中虚拟应用支持环境促进运动(VALENTINE):原理与设计
Am Heart J. 2022 Jun;248:53-62. doi: 10.1016/j.ahj.2022.02.012. Epub 2022 Feb 27.
3
IntelligentPooling: Practical Thompson Sampling for mHealth.
智能池化:移动健康领域实用的汤普森采样法
Mach Learn. 2021 Sep;110(9):2685-2727. doi: 10.1007/s10994-021-05995-8. Epub 2021 Jun 21.
4
Personalized Policy Learning using Longitudinal Mobile Health Data.使用纵向移动健康数据的个性化策略学习
J Am Stat Assoc. 2021;116(533):410-420. doi: 10.1080/01621459.2020.1785476. Epub 2020 Aug 11.
5
Linear mixed models with endogenous covariates: modeling sequential treatment effects with application to a mobile health study.具有内生协变量的线性混合模型:对序贯治疗效果建模及其在移动健康研究中的应用
Stat Sci. 2020;35(3):375-390. doi: 10.1214/19-sts720. Epub 2020 Sep 11.
6
Assessing Real-Time Moderation for Developing Adaptive Mobile Health Interventions for Medical Interns: Micro-Randomized Trial.评估为医学实习生开发适应性移动健康干预措施的实时审核:微型随机试验
J Med Internet Res. 2020 Mar 31;22(3):e15033. doi: 10.2196/15033.
7
Action Centered Contextual Bandits.以行动为中心的情境博弈
Adv Neural Inf Process Syst. 2017 Dec;30:5973-5981.
8
Doubly robust estimation in missing data and causal inference models.缺失数据与因果推断模型中的双重稳健估计
Biometrics. 2005 Dec;61(4):962-73. doi: 10.1111/j.1541-0420.2005.00377.x.