Meng Xiaoding, Li Hecheng, Chen Anshan
School of Computer Science and Technology, Qinghai Normal University, Xining 810008, China.
School of Mathematics and Statistics, Qinghai Normal University, Xining 810008, China.
Math Biosci Eng. 2023 Mar 3;20(5):8498-8530. doi: 10.3934/mbe.2023373.
The trade-off between exploitation and exploration is a dilemma inherent to particle swarm optimization (PSO) algorithms. Therefore, a growing body of PSO variants is devoted to solving the balance between the two. Among them, the method of self-adaptive multi-strategy selection plays a crucial role in improving the performance of PSO algorithms but has yet to be well exploited. In this research, with the aid of the reinforcement learning technique to guide the generation of offspring, a novel self-adaptive multi-strategy selection mechanism is designed, and then a multi-strategy self-learning PSO algorithm based on reinforcement learning (MPSORL) is proposed. First, the fitness value of particles is regarded as a set of states that are divided into several state subsets non-uniformly. Second, the ε-greedy strategy is employed to select the optimal strategy for each particle. The personal best particle and the global best particle are then updated after executing the strategy. Subsequently, the next state is determined. Thus, the value of the Q-table, as a scheme adopted in self-learning, is reshaped by the reward value, the action and the state in a non-stationary environment. Finally, the proposed algorithm is compared with other state-of-the-art algorithms on two well-known benchmark suites and a real-world problem. Extensive experiments indicate that MPSORL has better performance in terms of accuracy, convergence speed and non-parametric tests in most cases. The multi-strategy selection mechanism presented in the manuscript is effective.
探索与利用之间的权衡是粒子群优化(PSO)算法固有的两难问题。因此,越来越多的PSO变体致力于解决两者之间的平衡。其中,自适应多策略选择方法在提高PSO算法性能方面起着关键作用,但尚未得到充分利用。在本研究中,借助强化学习技术来指导后代的生成,设计了一种新颖的自适应多策略选择机制,然后提出了一种基于强化学习的多策略自学习PSO算法(MPSORL)。首先,将粒子的适应度值视为一组状态,这些状态被非均匀地划分为几个状态子集。其次,采用ε-贪婪策略为每个粒子选择最优策略。在执行策略后更新个体最优粒子和全局最优粒子。随后,确定下一个状态。因此,作为自学习中采用的一种方案,Q表的值在非平稳环境中由奖励值、动作和状态重新塑造。最后,将所提出的算法与其他最先进的算法在两个著名的基准测试集和一个实际问题上进行比较。大量实验表明,在大多数情况下,MPSORL在准确性、收敛速度和非参数测试方面具有更好的性能。手稿中提出的多策略选择机制是有效的。