具有多个奖励函数的线性拟合Q迭代

Linear Fitted-Q Iteration with Multiple Reward Functions.

作者信息

Lizotte Daniel J, Bowling Michael, Murphy Susan A

机构信息

David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada,

出版信息

J Mach Learn Res. 2012 Nov;13(Nov):3253-3295.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3670261/

Abstract

We present a general and detailed development of an algorithm for finite-horizon fitted-Q iteration with an arbitrary number of reward signals and linear value function approximation using an arbitrary number of state features. This includes a detailed treatment of the 3-reward function case using triangulation primitives from computational geometry and a method for identifying globally dominated actions. We also present an example of how our methods can be used to construct a real-world decision aid by considering symptom reduction, weight gain, and quality of life in sequential treatments for schizophrenia. Finally, we discuss future directions in which to take this work that will further enable our methods to make a positive impact on the field of evidence-based clinical decision support.

摘要

我们展示了一种用于有限时域拟合Q迭代算法的通用且详细的发展情况，该算法具有任意数量的奖励信号，并使用任意数量的状态特征进行线性值函数逼近。这包括使用计算几何中的三角剖分原语对三奖励函数情况进行详细处理，以及一种识别全局占优动作的方法。我们还给出了一个示例，说明如何通过在精神分裂症的序贯治疗中考虑症状减轻、体重增加和生活质量，将我们的方法用于构建现实世界的决策辅助工具。最后，我们讨论了这项工作未来的发展方向，这将进一步使我们的方法能够对循证临床决策支持领域产生积极影响。

相似文献

1

Linear Fitted-Q Iteration with Multiple Reward Functions.具有多个奖励函数的线性拟合Q迭代

J Mach Learn Res. 2012 Nov;13(Nov):3253-3295.

2

Multi-Objective Markov Decision Processes for Data-Driven Decision Support.用于数据驱动决策支持的多目标马尔可夫决策过程

J Mach Learn Res. 2016;17. Epub 2016 Dec 1.

3

Finite-approximation-error-based discrete-time iterative adaptive dynamic programming.基于有限逼近误差的离散时间迭代自适应动态规划。

IEEE Trans Cybern. 2014 Dec;44(12):2820-33. doi: 10.1109/TCYB.2014.2354377. Epub 2014 Sep 26.

4

Continuous-Time Fitted Value Iteration for Robust Policies.连续时间拟合值迭代法的鲁棒策略。

IEEE Trans Pattern Anal Mach Intell. 2023 May;45(5):5534-5548. doi: 10.1109/TPAMI.2022.3215769. Epub 2023 Apr 3.

5

An XGBoost-Based Fitted Q Iteration for Finding the Optimal STI Strategies for HIV Patients.基于XGBoost的拟合Q迭代法用于寻找HIV患者的最佳性传播感染（STI）策略

IEEE Trans Neural Netw Learn Syst. 2022 Jun 2;PP. doi: 10.1109/TNNLS.2022.3176204.

6

Semi-Infinitely Constrained Markov Decision Processes and Provably Efficient Reinforcement Learning.半无限约束马尔可夫决策过程与可证明的高效强化学习

IEEE Trans Pattern Anal Mach Intell. 2024 May;46(5):3722-3735. doi: 10.1109/TPAMI.2023.3348460. Epub 2024 Apr 3.

7

Impaired Expected Value Computations Coupled With Overreliance on Stimulus-Response Learning in Schizophrenia.精神分裂症患者的预期价值计算受损，同时过度依赖刺激-反应学习。

Biol Psychiatry Cogn Neurosci Neuroimaging. 2018 Nov;3(11):916-926. doi: 10.1016/j.bpsc.2018.03.014. Epub 2018 Apr 3.

8

Striatal dopamine, reward, and decision making in schizophrenia.精神分裂症中的纹状体多巴胺、奖赏与决策

Dialogues Clin Neurosci. 2016 Mar;18(1):77-89. doi: 10.31887/DCNS.2016.18.1/ldeserno.

9

Optimization of anemia treatment in hemodialysis patients via reinforcement learning.通过强化学习优化血液透析患者的贫血治疗。

Artif Intell Med. 2014 Sep;62(1):47-60. doi: 10.1016/j.artmed.2014.07.004. Epub 2014 Jul 19.

10

Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems.非仿射离散时间非线性系统的无限时域自学习最优控制。

IEEE Trans Neural Netw Learn Syst. 2015 Apr;26(4):866-79. doi: 10.1109/TNNLS.2015.2401334. Epub 2015 Mar 2.

引用本文的文献

1

A Bayesian multivariate hierarchical model for developing a treatment benefit index using mixed types of outcomes.利用混合类型结局构建治疗获益指数的贝叶斯多元层次模型。

BMC Med Res Methodol. 2024 Sep 27;24(1):218. doi: 10.1186/s12874-024-02333-z.

2

Optimal Personalized Treatment Selection with Multivariate Outcome Measures in a Multiple Treatment Case.在多治疗案例中基于多变量结果指标进行最佳个性化治疗选择

Commun Stat Simul Comput. 2023;52(12):5773-5787. doi: 10.1080/03610918.2021.1999473. Epub 2021 Nov 15.

3

Multiobjective tree-based reinforcement learning for estimating tolerant dynamic treatment regimes.基于多目标树的强化学习估计宽容动态治疗方案。

Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujad017.

4

Improving Individualized Treatment Decisions: A Bayesian Multivariate Hierarchical Model for Developing a Treatment Benefit Index using Mixed Types of Outcomes.改善个性化治疗决策：一种使用混合类型结果开发治疗效益指数的贝叶斯多变量层次模型

medRxiv. 2024 Jan 7:2023.11.17.23298711. doi: 10.1101/2023.11.17.23298711.

5

When the Ends do not Justify the Means: Learning Who is Predicted to Have Harmful Indirect Effects.当目的无法证明手段的合理性时：了解谁被预测会产生有害的间接影响。

J R Stat Soc Ser A Stat Soc. 2022 Dec;185(Suppl 2):S573-S589. doi: 10.1111/rssa.12951. Epub 2022 Nov 8.

6

Quantiles based personalized treatment selection for multivariate outcomes and multiple treatments.基于分位数的多变量结局和多种处理的个性化治疗选择。

Stat Med. 2022 Jul 10;41(15):2695-2710. doi: 10.1002/sim.9377. Epub 2022 Mar 16.

7

Multi-Response Based Personalized Treatment Selection with Data from Crossover Designs for Multiple Treatments.基于多响应的个性化治疗选择，利用多种治疗交叉设计的数据

Commun Stat Simul Comput. 2022;51(2):554-569. doi: 10.1080/03610918.2019.1656739. Epub 2019 Sep 10.

8

Risk controlled decision trees and random forests for precision Medicine.风险控制决策树和随机森林在精准医学中的应用。

Stat Med. 2022 Feb 20;41(4):719-735. doi: 10.1002/sim.9253. Epub 2021 Nov 16.

9

Estimation and Optimization of Composite Outcomes.复合结局的估计与优化

J Mach Learn Res. 2021 Jan;22.

10

Step-adjusted tree-based reinforcement learning for evaluating nested dynamic treatment regimes using test-and-treat observational data.基于树的分步调整强化学习在使用测试和治疗观察数据评估嵌套动态治疗方案中的应用。

Stat Med. 2021 Nov 30;40(27):6164-6177. doi: 10.1002/sim.9177. Epub 2021 Sep 7.

本文引用的文献

1

Set-valued dynamic treatment regimes for competing outcomes.用于竞争结局的集值动态治疗方案。

Biometrics. 2014 Mar;70(1):53-61. doi: 10.1111/biom.12132. Epub 2014 Jan 8.

2

Putting families in the center: family perspectives on decision making and ADHD and implications for ADHD care.将家庭放在中心位置：家庭对 ADHD 决策的看法及其对 ADHD 护理的影响。

J Atten Disord. 2012 Nov;16(8):675-84. doi: 10.1177/1087054711413077. Epub 2011 Oct 5.

3

Informing sequential clinical decision-making through reinforcement learning: an empirical study.通过强化学习为序贯临床决策提供信息：一项实证研究。

Mach Learn. 2011 Jul 1;84(1-2):109-136. doi: 10.1007/s10994-010-5229-0.

4

Elicitation of ostomy pouch preferences: a discrete-choice experiment.造口袋偏好的引出：一项离散选择实验。

Patient. 2011;4(3):163-75. doi: 10.2165/11586430-000000000-00000.

5

Recursive subsetting to identify patients in the STAR*D: a method to enhance the accuracy of early prediction of treatment outcome and to inform personalized care.递归子集法识别 STAR*D 患者：提高治疗结果早期预测准确性并为个体化治疗提供信息的方法。

J Clin Psychiatry. 2010 Nov;71(11):1502-8. doi: 10.4088/JCP.10m06168blu.

6

Reinforcement learning design for cancer clinical trials.强化学习在癌症临床试验中的设计。

Stat Med. 2009 Nov 20;28(26):3294-315. doi: 10.1002/sim.3720.

7

Some Geometric Methods for Constructing Decision Criteria Based On Two-Dimensional Parameters.一些基于二维参数构建决策标准的几何方法。

J Stat Plan Inference. 2008;138(2):516-527. doi: 10.1016/j.jspi.2007.06.013.

8

Constructing evidence-based treatment strategies using methods from computer science.运用计算机科学方法构建循证治疗策略。

Drug Alcohol Depend. 2007 May;88 Suppl 2(Suppl 2):S52-60. doi: 10.1016/j.drugalcdep.2007.01.005. Epub 2007 Feb 21.

9

Methodological challenges in constructing effective treatment sequences for chronic psychiatric disorders.构建慢性精神疾病有效治疗序列中的方法学挑战。

Neuropsychopharmacology. 2007 Feb;32(2):257-62. doi: 10.1038/sj.npp.1301241. Epub 2006 Nov 8.

10

An experimental design for the development of adaptive treatment strategies.一种用于制定适应性治疗策略的实验设计。

Stat Med. 2005 May 30;24(10):1455-81. doi: 10.1002/sim.2022.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验