Li Yu, Zhang Yan
School of Aeronautics and Astronautics, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518000, China.
School of Electronics and Communication Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518000, China.
Biomimetics (Basel). 2024 Oct 1;9(10):596. doi: 10.3390/biomimetics9100596.
The nutcracker optimizer algorithm (NOA) is a metaheuristic method proposed in recent years. This algorithm simulates the behavior of nutcrackers searching and storing food in nature to solve the optimization problem. However, the traditional NOA struggles to balance global exploration and local exploitation effectively, making it prone to getting trapped in local optima when solving complex problems. To address these shortcomings, this study proposes a reinforcement learning-based bi-population nutcracker optimizer algorithm called RLNOA. In the RLNOA, a bi-population mechanism is introduced to better balance global and local optimization capabilities. At the beginning of each iteration, the raw population is divided into an exploration sub-population and an exploitation sub-population based on the fitness value of each individual. The exploration sub-population is composed of individuals with poor fitness values. An improved foraging strategy based on random opposition-based learning is designed as the update method for the exploration sub-population to enhance diversity. Meanwhile, Q-learning serves as an adaptive selector for exploitation strategies, enabling optimal adjustment of the exploitation sub-population's behavior across various problems. The performance of the RLNOA is evaluated using the CEC-2014, CEC-2017, and CEC-2020 benchmark function sets, and it is compared against nine state-of-the-art metaheuristic algorithms. Experimental results demonstrate the superior performance of the proposed algorithm.
胡桃夹子优化算法(NOA)是近年来提出的一种元启发式方法。该算法模拟了自然界中胡桃夹子寻找和储存食物的行为来解决优化问题。然而,传统的NOA难以有效平衡全局探索和局部开发,导致在解决复杂问题时容易陷入局部最优。为了解决这些缺点,本研究提出了一种基于强化学习的双种群胡桃夹子优化算法,称为RLNOA。在RLNOA中,引入了双种群机制以更好地平衡全局和局部优化能力。在每次迭代开始时,根据每个个体的适应度值将原始种群划分为探索子种群和开发子种群。探索子种群由适应度值较差的个体组成。设计了一种基于随机反向学习的改进觅食策略作为探索子种群的更新方法以增强多样性。同时,Q学习作为开发策略的自适应选择器,能够在各种问题上对开发子种群的行为进行最优调整。使用CEC - 2014、CEC - 2017和CEC - 2020基准函数集对RLNOA的性能进行评估,并与九种先进的元启发式算法进行比较。实验结果证明了所提算法的优越性能。