Suppr超能文献

用于数据驱动决策支持的多目标马尔可夫决策过程

Multi-Objective Markov Decision Processes for Data-Driven Decision Support.

作者信息

Lizotte Daniel J, Laber Eric B

机构信息

Department of Computer Science, Department of Epidemiology & Biostatistics, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 3K7, Canada.

Department of Statistics, North Carolina State University, Raliegh, NC 27695, USA.

出版信息

J Mach Learn Res. 2016;17. Epub 2016 Dec 1.

Abstract

We present new methodology based on Multi-Objective Markov Decision Processes for developing sequential decision support systems from data. Our approach uses sequential decision-making data to provide support that is useful to many different decision-makers, each with different, potentially time-varying preference. To accomplish this, we develop an extension of fitted- iteration for multiple objectives that computes policies for all scalarization functions, i.e. preference functions, simultaneously from continuous-state, finite-horizon data. We identify and address several conceptual and computational challenges along the way, and we introduce a new solution concept that is appropriate when different actions have similar expected outcomes. Finally, we demonstrate an application of our method using data from the Clinical Antipsychotic Trials of Intervention Effectiveness and show that our approach offers decision-makers increased choice by a larger class of optimal policies.

摘要

我们提出了一种基于多目标马尔可夫决策过程的新方法,用于从数据中开发顺序决策支持系统。我们的方法使用顺序决策数据来提供对许多不同决策者有用的支持,每个决策者都有不同的、可能随时间变化的偏好。为了实现这一点,我们开发了一种适用于多个目标的拟合迭代扩展方法,该方法可以从连续状态、有限时间范围的数据中同时计算所有标量化函数(即偏好函数)的策略。在此过程中,我们识别并解决了几个概念和计算方面的挑战,并引入了一种新的解决方案概念,当不同行动具有相似的预期结果时该概念适用。最后,我们使用干预有效性临床抗精神病药物试验的数据展示了我们方法的应用,并表明我们的方法为决策者提供了更多选择,即更大类别的最优策略。

相似文献

2
Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play.
Front Neurorobot. 2018 Oct 9;12:65. doi: 10.3389/fnbot.2018.00065. eCollection 2018.
3
Linear Fitted-Q Iteration with Multiple Reward Functions.
J Mach Learn Res. 2012 Nov;13(Nov):3253-3295.
4
Optimization of anemia treatment in hemodialysis patients via reinforcement learning.
Artif Intell Med. 2014 Sep;62(1):47-60. doi: 10.1016/j.artmed.2014.07.004. Epub 2014 Jul 19.
5
On the complexity of computing Markov perfect equilibrium in general-sum stochastic games.
Natl Sci Rev. 2022 Nov 22;10(1):nwac256. doi: 10.1093/nsr/nwac256. eCollection 2023 Jan.
6
Reinforcement Learning-Aided Channel Estimator in Time-Varying MIMO Systems.
Sensors (Basel). 2023 Jun 18;23(12):5689. doi: 10.3390/s23125689.
8
Artificial intelligence framework for simulating clinical decision-making: a Markov decision process approach.
Artif Intell Med. 2013 Jan;57(1):9-19. doi: 10.1016/j.artmed.2012.12.003. Epub 2012 Dec 31.
10

引用本文的文献

1
Fusing Individualized Treatment Rules Using Secondary Outcomes.
Proc Mach Learn Res. 2024 May;238:712-720.
2
A Bayesian multivariate hierarchical model for developing a treatment benefit index using mixed types of outcomes.
BMC Med Res Methodol. 2024 Sep 27;24(1):218. doi: 10.1186/s12874-024-02333-z.
3
Optimal Personalized Treatment Selection with Multivariate Outcome Measures in a Multiple Treatment Case.
Commun Stat Simul Comput. 2023;52(12):5773-5787. doi: 10.1080/03610918.2021.1999473. Epub 2021 Nov 15.
6
Quantiles based personalized treatment selection for multivariate outcomes and multiple treatments.
Stat Med. 2022 Jul 10;41(15):2695-2710. doi: 10.1002/sim.9377. Epub 2022 Mar 16.
8
Multi-Response Based Personalized Treatment Selection with Data from Crossover Designs for Multiple Treatments.
Commun Stat Simul Comput. 2022;51(2):554-569. doi: 10.1080/03610918.2019.1656739. Epub 2019 Sep 10.

本文引用的文献

1
Dynamic treatment regimes: technical challenges and applications.
Electron J Stat. 2014;8(1):1225-1272. doi: 10.1214/14-ejs920.
2
Set-valued dynamic treatment regimes for competing outcomes.
Biometrics. 2014 Mar;70(1):53-61. doi: 10.1111/biom.12132. Epub 2014 Jan 8.
3
Linear Fitted-Q Iteration with Multiple Reward Functions.
J Mach Learn Res. 2012 Nov;13(Nov):3253-3295.
4
What is the optimal threshold at which to recommend breast biopsy?
PLoS One. 2012;7(11):e48820. doi: 10.1371/journal.pone.0048820. Epub 2012 Nov 7.
5
Informing sequential clinical decision-making through reinforcement learning: an empirical study.
Mach Learn. 2011 Jul 1;84(1-2):109-136. doi: 10.1007/s10994-010-5229-0.
6
Markov decision processes: a tool for sequential decision making under uncertainty.
Med Decis Making. 2010 Jul-Aug;30(4):474-83. doi: 10.1177/0272989X09353194. Epub 2009 Dec 31.
7
Regret-regression for optimal dynamic treatment regimes.
Biometrics. 2010 Dec;66(4):1192-201. doi: 10.1111/j.1541-0420.2009.01368.x.
8
Demystifying optimal dynamic treatment regimes.
Biometrics. 2007 Jun;63(2):447-55. doi: 10.1111/j.1541-0420.2006.00686.x.
9
Moderators and mediators of a web-based computer-tailored smoking cessation program among nicotine patch users.
Nicotine Tob Res. 2006 Dec;8 Suppl 1:S95-101. doi: 10.1080/14622200601039444.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验