Johnson Fred A, Fackler Paul L, Boomer G Scott, Zimmerman Guthrie S, Williams Byron K, Nichols James D, Dorazio Robert M
Wetland and Aquatic Research Center, U. S. Geological Survey, Gainesville, Florida, United States of America.
Department of Agriculture and Resource Economics, North Carolina State University, Raleigh, North Carolina, United States of America.
PLoS One. 2016 Jun 17;11(6):e0157373. doi: 10.1371/journal.pone.0157373. eCollection 2016.
Markov decision processes (MDPs), which involve a temporal sequence of actions conditioned on the state of the managed system, are increasingly being applied in natural resource management. This study focuses on the modification of a traditional MDP to account for those cases in which an action must be chosen after a significant time lag in observing system state, but just prior to a new observation. In order to calculate an optimal decision policy under these conditions, possible actions must be conditioned on the previous observed system state and action taken. We show how to solve these problems when the state transition structure is known and when it is uncertain. Our focus is on the latter case, and we show how actions must be conditioned not only on the previous system state and action, but on the probabilities associated with alternative models of system dynamics. To demonstrate this framework, we calculated and simulated optimal, adaptive policies for MDPs with lagged states for the problem of deciding annual harvest regulations for mallards (Anas platyrhynchos) in the United States. In this particular example, changes in harvest policy induced by the use of lagged information about system state were sufficient to maintain expected management performance (e.g. population size, harvest) even in the face of an uncertain system state at the time of a decision.
马尔可夫决策过程(MDP)涉及根据被管理系统的状态而定的一系列时间上的行动,它在自然资源管理中的应用越来越广泛。本研究聚焦于对传统MDP的改进,以适用于那些在观察系统状态后有显著时间滞后,但在新观察之前必须做出行动选择的情况。为了在这些条件下计算最优决策策略,可能的行动必须以先前观察到的系统状态和采取的行动为条件。我们展示了在状态转移结构已知和不确定的情况下如何解决这些问题。我们关注的是后一种情况,并展示了行动不仅必须以先前的系统状态和行动为条件,还必须以与系统动态的替代模型相关的概率为条件。为了演示这个框架,我们针对美国绿头鸭(Anas platyrhynchos)年度狩猎规定的决策问题,计算并模拟了具有滞后状态的MDP的最优自适应策略。在这个特定例子中,即使在决策时面对不确定的系统状态,利用关于系统状态的滞后信息所引发的狩猎政策变化也足以维持预期的管理绩效(如种群规模、狩猎量)。