Lizotte Daniel J, Laber Eric B
Department of Computer Science, Department of Epidemiology & Biostatistics, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 3K7, Canada.
Department of Statistics, North Carolina State University, Raliegh, NC 27695, USA.
J Mach Learn Res. 2016;17. Epub 2016 Dec 1.
We present new methodology based on Multi-Objective Markov Decision Processes for developing sequential decision support systems from data. Our approach uses sequential decision-making data to provide support that is useful to many different decision-makers, each with different, potentially time-varying preference. To accomplish this, we develop an extension of fitted- iteration for multiple objectives that computes policies for all scalarization functions, i.e. preference functions, simultaneously from continuous-state, finite-horizon data. We identify and address several conceptual and computational challenges along the way, and we introduce a new solution concept that is appropriate when different actions have similar expected outcomes. Finally, we demonstrate an application of our method using data from the Clinical Antipsychotic Trials of Intervention Effectiveness and show that our approach offers decision-makers increased choice by a larger class of optimal policies.
我们提出了一种基于多目标马尔可夫决策过程的新方法,用于从数据中开发顺序决策支持系统。我们的方法使用顺序决策数据来提供对许多不同决策者有用的支持,每个决策者都有不同的、可能随时间变化的偏好。为了实现这一点,我们开发了一种适用于多个目标的拟合迭代扩展方法,该方法可以从连续状态、有限时间范围的数据中同时计算所有标量化函数(即偏好函数)的策略。在此过程中,我们识别并解决了几个概念和计算方面的挑战,并引入了一种新的解决方案概念,当不同行动具有相似的预期结果时该概念适用。最后,我们使用干预有效性临床抗精神病药物试验的数据展示了我们方法的应用,并表明我们的方法为决策者提供了更多选择,即更大类别的最优策略。