Gan Kyra, Keyvanshokooh Esmaeil, Liu Xueqing, Murphy Susan
Cornell Tech.
Texas A&M University.
Proc Mach Learn Res. 2024 May;238:3970-3978.
Contextual bandit algorithms are commonly used in digital health to recommend personalized treatments. However, to ensure the effectiveness of the treatments, patients are often requested to take actions that have no immediate benefit to them, which we refer to as actions. In practice, clinicians have a limited budget to encourage patients to take these actions and collect additional information. We introduce a novel optimization and learning algorithm to address this problem. This algorithm effectively combines the strengths of two algorithmic approaches in a seamless manner, including 1) an online primal-dual algorithm for deciding the optimal timing to reach out to patients, and 2) a contextual bandit learning algorithm to deliver personalized treatment to the patient. We prove that this algorithm admits a sub-linear regret bound. We illustrate the usefulness of this algorithm on both synthetic and real-world data.
上下文博弈算法常用于数字健康领域以推荐个性化治疗方案。然而,为确保治疗效果,患者常被要求采取对自身无直接益处的行动,我们将其称为无即时收益行动。在实践中,临床医生鼓励患者采取这些行动并收集额外信息的预算有限。我们引入一种新颖的优化与学习算法来解决此问题。该算法以无缝方式有效结合了两种算法方法的优势,包括:1)一种在线原始对偶算法,用于确定与患者联系的最佳时机;2)一种上下文博弈学习算法,为患者提供个性化治疗。我们证明该算法具有次线性遗憾界。我们在合成数据和真实世界数据上都展示了此算法的实用性。