深度不确定下决策的强化学习。

Reinforcement learning for decision-making under deep uncertainty.

机构信息

School of Computing and Information Systems, Faculty of Engineering and Information Technology, The University of Melbourne, Australia.

The Commonwealth Scientific and Industrial Research Organisation (CSIRO), Melbourne, Australia.

出版信息

J Environ Manage. 2024 May;359:120968. doi: 10.1016/j.jenvman.2024.120968. Epub 2024 May 3.

DOI:10.1016/j.jenvman.2024.120968

PMID:38703643

Abstract

Planning under complex uncertainty often asks for plans that can adapt to changing future conditions. To inform plan development during this process, exploration methods have been used to explore the performance of candidate policies given uncertainties. Nevertheless, these methods hardly enable adaptation by themselves, so extra efforts are required to develop the final adaptive plans, hence compromising the overall decision-making efficiency. This paper introduces Reinforcement Learning (RL) that employs closed-loop control as a new exploration method that enables automated adaptive policy-making for planning under uncertainty. To investigate its performance, we compare RL with a widely-used exploration method, Multi-Objective Evolutionary Algorithm (MOEA), in two hypothetical problems via computational experiments. Our results indicate the complementarity of the two methods. RL makes better use of its exploration history, hence always providing higher efficiency and providing better policy robustness in the presence of parameter uncertainty. MOEA quantifies objective uncertainty in a more intuitive way, hence providing better robustness to objective uncertainty. These findings will help researchers choose appropriate methods in different applications.

摘要

在复杂的不确定性下进行规划通常需要能够适应未来变化条件的计划。为了在这个过程中为规划提供信息，探索方法已被用于探索候选政策在不确定性下的性能。然而，这些方法本身几乎不能实现自适应，因此需要额外的努力来开发最终的自适应计划，从而影响整体决策效率。本文介绍了强化学习（RL），它采用闭环控制作为一种新的探索方法，为不确定性下的规划实现自动化自适应决策。为了研究其性能，我们通过计算实验在两个假设问题中比较了 RL 与广泛使用的探索方法多目标进化算法（MOEA）。我们的结果表明了这两种方法的互补性。RL 更好地利用了其探索历史，因此在存在参数不确定性时始终提供更高的效率和更好的政策稳健性。MOEA 以更直观的方式量化了目标不确定性，因此对目标不确定性具有更好的稳健性。这些发现将有助于研究人员在不同的应用中选择合适的方法。