Prokhorov D V, Wunsch D C
Dept. of Electr. Eng., Texas Tech. Univ., Lubbock, TX.
IEEE Trans Neural Netw. 1997;8(5):997-1007. doi: 10.1109/72.623201.
We discuss a variety of adaptive critic designs (ACDs) for neurocontrol. These are suitable for learning in noisy, nonlinear, and nonstationary environments. They have common roots as generalizations of dynamic programming for neural reinforcement learning approaches. Our discussion of these origins leads to an explanation of three design families: heuristic dynamic programming, dual heuristic programming, and globalized dual heuristic programming (GDHP). The main emphasis is on DHP and GDHP as advanced ACDs. We suggest two new modifications of the original GDHP design that are currently the only working implementations of GDHP. They promise to be useful for many engineering applications in the areas of optimization and optimal control. Based on one of these modifications, we present a unified approach to all ACDs. This leads to a generalized training procedure for ACDs.
我们讨论了用于神经控制的多种自适应评判设计(ACD)。这些设计适用于在噪声、非线性和非平稳环境中进行学习。它们有着共同的根源,是神经强化学习方法中动态规划的推广。我们对这些起源的讨论引出了对三个设计家族的解释:启发式动态规划、对偶启发式规划和全局化对偶启发式规划(GDHP)。主要重点是作为先进ACD的DHP和GDHP。我们提出了对原始GDHP设计的两种新修改,它们目前是GDHP仅有的可行实现方式。它们有望在优化和最优控制领域的许多工程应用中发挥作用。基于其中一种修改,我们提出了一种适用于所有ACD的统一方法。这导致了一种用于ACD的广义训练过程。