• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于数据驱动决策支持的多目标马尔可夫决策过程

Multi-Objective Markov Decision Processes for Data-Driven Decision Support.

作者信息

Lizotte Daniel J, Laber Eric B

机构信息

Department of Computer Science, Department of Epidemiology & Biostatistics, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 3K7, Canada.

Department of Statistics, North Carolina State University, Raliegh, NC 27695, USA.

出版信息

J Mach Learn Res. 2016;17. Epub 2016 Dec 1.

PMID:28018133
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5179144/
Abstract

We present new methodology based on Multi-Objective Markov Decision Processes for developing sequential decision support systems from data. Our approach uses sequential decision-making data to provide support that is useful to many different decision-makers, each with different, potentially time-varying preference. To accomplish this, we develop an extension of fitted- iteration for multiple objectives that computes policies for all scalarization functions, i.e. preference functions, simultaneously from continuous-state, finite-horizon data. We identify and address several conceptual and computational challenges along the way, and we introduce a new solution concept that is appropriate when different actions have similar expected outcomes. Finally, we demonstrate an application of our method using data from the Clinical Antipsychotic Trials of Intervention Effectiveness and show that our approach offers decision-makers increased choice by a larger class of optimal policies.

摘要

我们提出了一种基于多目标马尔可夫决策过程的新方法,用于从数据中开发顺序决策支持系统。我们的方法使用顺序决策数据来提供对许多不同决策者有用的支持,每个决策者都有不同的、可能随时间变化的偏好。为了实现这一点,我们开发了一种适用于多个目标的拟合迭代扩展方法,该方法可以从连续状态、有限时间范围的数据中同时计算所有标量化函数(即偏好函数)的策略。在此过程中,我们识别并解决了几个概念和计算方面的挑战,并引入了一种新的解决方案概念,当不同行动具有相似的预期结果时该概念适用。最后,我们使用干预有效性临床抗精神病药物试验的数据展示了我们方法的应用,并表明我们的方法为决策者提供了更多选择,即更大类别的最优策略。

相似文献

1
Multi-Objective Markov Decision Processes for Data-Driven Decision Support.用于数据驱动决策支持的多目标马尔可夫决策过程
J Mach Learn Res. 2016;17. Epub 2016 Dec 1.
2
Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play.通过内在激励的自我博弈在多目标马尔可夫决策过程中演化稳健的策略覆盖集
Front Neurorobot. 2018 Oct 9;12:65. doi: 10.3389/fnbot.2018.00065. eCollection 2018.
3
Linear Fitted-Q Iteration with Multiple Reward Functions.具有多个奖励函数的线性拟合Q迭代
J Mach Learn Res. 2012 Nov;13(Nov):3253-3295.
4
Optimization of anemia treatment in hemodialysis patients via reinforcement learning.通过强化学习优化血液透析患者的贫血治疗。
Artif Intell Med. 2014 Sep;62(1):47-60. doi: 10.1016/j.artmed.2014.07.004. Epub 2014 Jul 19.
5
On the complexity of computing Markov perfect equilibrium in general-sum stochastic games.关于一般和随机博弈中马尔可夫完美均衡计算的复杂性
Natl Sci Rev. 2022 Nov 22;10(1):nwac256. doi: 10.1093/nsr/nwac256. eCollection 2023 Jan.
6
Reinforcement Learning-Aided Channel Estimator in Time-Varying MIMO Systems.基于强化学习的时变 MIMO 系统信道估计器。
Sensors (Basel). 2023 Jun 18;23(12):5689. doi: 10.3390/s23125689.
7
From data to optimal decision making: a data-driven, probabilistic machine learning approach to decision support for patients with sepsis.从数据到最优决策:一种数据驱动的、概率机器学习方法,用于支持脓毒症患者的决策。
JMIR Med Inform. 2015 Feb 24;3(1):e11. doi: 10.2196/medinform.3445.
8
Artificial intelligence framework for simulating clinical decision-making: a Markov decision process approach.人工智能框架模拟临床决策:马尔可夫决策过程方法。
Artif Intell Med. 2013 Jan;57(1):9-19. doi: 10.1016/j.artmed.2012.12.003. Epub 2012 Dec 31.
9
Multi-robot hierarchical safe reinforcement learning autonomous decision-making strategy based on uniformly ultimate boundedness constraints.基于一致最终有界约束的多机器人分层安全强化学习自主决策策略
Sci Rep. 2025 Feb 18;15(1):5990. doi: 10.1038/s41598-025-89285-6.
10
The Effectiveness of Integrated Care Pathways for Adults and Children in Health Care Settings: A Systematic Review.综合护理路径在医疗环境中对成人和儿童的有效性:一项系统评价。
JBI Libr Syst Rev. 2009;7(3):80-129. doi: 10.11124/01938924-200907030-00001.

引用本文的文献

1
Fusing Individualized Treatment Rules Using Secondary Outcomes.使用次要结果融合个体化治疗规则
Proc Mach Learn Res. 2024 May;238:712-720.
2
A Bayesian multivariate hierarchical model for developing a treatment benefit index using mixed types of outcomes.利用混合类型结局构建治疗获益指数的贝叶斯多元层次模型。
BMC Med Res Methodol. 2024 Sep 27;24(1):218. doi: 10.1186/s12874-024-02333-z.
3
Optimal Personalized Treatment Selection with Multivariate Outcome Measures in a Multiple Treatment Case.在多治疗案例中基于多变量结果指标进行最佳个性化治疗选择
Commun Stat Simul Comput. 2023;52(12):5773-5787. doi: 10.1080/03610918.2021.1999473. Epub 2021 Nov 15.
4
Multiobjective tree-based reinforcement learning for estimating tolerant dynamic treatment regimes.基于多目标树的强化学习估计宽容动态治疗方案。
Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujad017.
5
Improving Individualized Treatment Decisions: A Bayesian Multivariate Hierarchical Model for Developing a Treatment Benefit Index using Mixed Types of Outcomes.改善个性化治疗决策:一种使用混合类型结果开发治疗效益指数的贝叶斯多变量层次模型
medRxiv. 2024 Jan 7:2023.11.17.23298711. doi: 10.1101/2023.11.17.23298711.
6
Quantiles based personalized treatment selection for multivariate outcomes and multiple treatments.基于分位数的多变量结局和多种处理的个性化治疗选择。
Stat Med. 2022 Jul 10;41(15):2695-2710. doi: 10.1002/sim.9377. Epub 2022 Mar 16.
7
Knowledge graph-based recommendation framework identifies drivers of resistance in EGFR mutant non-small cell lung cancer.基于知识图谱的推荐框架识别 EGFR 突变型非小细胞肺癌耐药的驱动因素。
Nat Commun. 2022 Mar 29;13(1):1667. doi: 10.1038/s41467-022-29292-7.
8
Multi-Response Based Personalized Treatment Selection with Data from Crossover Designs for Multiple Treatments.基于多响应的个性化治疗选择,利用多种治疗交叉设计的数据
Commun Stat Simul Comput. 2022;51(2):554-569. doi: 10.1080/03610918.2019.1656739. Epub 2019 Sep 10.
9
Estimation and Optimization of Composite Outcomes.复合结局的估计与优化
J Mach Learn Res. 2021 Jan;22.
10
Step-adjusted tree-based reinforcement learning for evaluating nested dynamic treatment regimes using test-and-treat observational data.基于树的分步调整强化学习在使用测试和治疗观察数据评估嵌套动态治疗方案中的应用。
Stat Med. 2021 Nov 30;40(27):6164-6177. doi: 10.1002/sim.9177. Epub 2021 Sep 7.

本文引用的文献

1
Dynamic treatment regimes: technical challenges and applications.动态治疗方案:技术挑战与应用
Electron J Stat. 2014;8(1):1225-1272. doi: 10.1214/14-ejs920.
2
Set-valued dynamic treatment regimes for competing outcomes.用于竞争结局的集值动态治疗方案。
Biometrics. 2014 Mar;70(1):53-61. doi: 10.1111/biom.12132. Epub 2014 Jan 8.
3
Linear Fitted-Q Iteration with Multiple Reward Functions.具有多个奖励函数的线性拟合Q迭代
J Mach Learn Res. 2012 Nov;13(Nov):3253-3295.
4
What is the optimal threshold at which to recommend breast biopsy?推荐行乳房活检的最佳阈值是多少?
PLoS One. 2012;7(11):e48820. doi: 10.1371/journal.pone.0048820. Epub 2012 Nov 7.
5
Informing sequential clinical decision-making through reinforcement learning: an empirical study.通过强化学习为序贯临床决策提供信息:一项实证研究。
Mach Learn. 2011 Jul 1;84(1-2):109-136. doi: 10.1007/s10994-010-5229-0.
6
Markov decision processes: a tool for sequential decision making under uncertainty.马尔可夫决策过程:一种在不确定性下进行序贯决策的工具。
Med Decis Making. 2010 Jul-Aug;30(4):474-83. doi: 10.1177/0272989X09353194. Epub 2009 Dec 31.
7
Regret-regression for optimal dynamic treatment regimes.用于优化动态治疗方案的后悔回归法。
Biometrics. 2010 Dec;66(4):1192-201. doi: 10.1111/j.1541-0420.2009.01368.x.
8
Demystifying optimal dynamic treatment regimes.揭开最优动态治疗方案的神秘面纱。
Biometrics. 2007 Jun;63(2):447-55. doi: 10.1111/j.1541-0420.2006.00686.x.
9
Moderators and mediators of a web-based computer-tailored smoking cessation program among nicotine patch users.尼古丁贴片使用者中基于网络的计算机定制戒烟计划的调节因素和中介因素
Nicotine Tob Res. 2006 Dec;8 Suppl 1:S95-101. doi: 10.1080/14622200601039444.
10
Assessing clinical and functional outcomes in the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) schizophrenia trial.在临床抗精神病药物干预有效性试验(CATIE)精神分裂症试验中评估临床和功能结局。
Schizophr Bull. 2003;29(1):33-43. doi: 10.1093/oxfordjournals.schbul.a006989.