Lin C T, Jou C P
Department of Electrical and Control Engineering, National Chiao-Tung University, Hsinchu, Taiwan, R.O.C.
IEEE Trans Neural Netw. 1999;10(4):846-59. doi: 10.1109/72.774236.
This paper proposes a TD (temporal difference) and GA (genetic algorithm) based reinforcement (TDGAR) neural learning scheme for controlling chaotic dynamical systems based on the technique of small perturbations. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to fulfill the reinforcement learning task. Structurely, the TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network for helping the learning of the other network, the action network, which determines the outputs (actions) of the TDGAR learning system. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. This can usually accelerate the GA learning since an external reinforcement signal may only be available at a time long after a sequence of actions have occurred in the reinforcement learning problems. By defining a simple external reinforcement signal, the TDGAR learning system can learn to produce a series of small perturbations to convert chaotic oscillations of a chaotic system into desired regular ones with a periodic behavior. The proposed method is an adaptive search for the optimum control technique. Computer simulations on controlling two chaotic systems, i.e., the Hénon map and the logistic map, have been conducted to illustrate the performance of the proposed method.
本文基于小扰动技术,提出了一种用于控制混沌动力系统的基于时间差分(TD)和遗传算法(GA)的强化(TDGAR)神经学习方案。TDGAR学习方案是一种新型混合遗传算法,它将TD预测方法与遗传算法相结合以完成强化学习任务。在结构上,TDGAR学习系统由两个集成的前馈网络组成。一个神经网络充当评判网络,用于帮助另一个网络(即动作网络)进行学习,动作网络决定TDGAR学习系统的输出(动作)。利用TD预测方法,评判网络可以预测外部强化信号,并向动作网络提供更具信息性的内部强化信号。动作网络根据内部强化信号利用遗传算法进行自我调整。这通常可以加速遗传算法的学习,因为在强化学习问题中,外部强化信号可能仅在一系列动作发生很长时间后才可用。通过定义一个简单的外部强化信号,TDGAR学习系统可以学习产生一系列小扰动,将混沌系统的混沌振荡转换为具有周期性行为的期望规则振荡。所提出的方法是对最优控制技术的一种自适应搜索。已对控制两个混沌系统(即亨农映射和逻辑斯谛映射)进行了计算机模拟,以说明所提方法的性能。