Bilban Mehmet, İnan Onur
Computer Technologies, Necmettin Erbakan University, 42360 Seydişehir, Turkey.
Computer Engineering, Faculty of Technology, Selçuk University, 42140 Konya, Turkey.
Sensors (Basel). 2025 Mar 20;25(6):1941. doi: 10.3390/s25061941.
Autonomous vehicles must make quick and accurate decisions to operate efficiently in complex and dynamic urban traffic environments, necessitating a reliable and stable learning mechanism. The proximal policy optimization (PPO) algorithm stands out among reinforcement learning (RL) methods for its consistent learning process, ensuring stable decisions under varying conditions while avoiding abrupt deviations during execution. However, the PPO algorithm often becomes trapped in a limited search space during policy updates, restricting its adaptability to environmental changes and alternative strategy exploration. To overcome this limitation, we integrated Lévy flight's chaotic and comprehensive exploration capabilities into the PPO algorithm. Our method helped the algorithm explore larger solution spaces and reduce the risk of getting stuck in local minima. In this study, we collected real-time data such as speed, acceleration, traffic sign positions, vehicle locations, traffic light statuses, and distances to surrounding objects from the CARLA simulator, processed via Apache Kafka. These data were analyzed by both the standard PPO and our novel Lévy flight-enhanced PPO (LFPPO) algorithm. While the PPO algorithm offers consistency, its limited exploration hampers adaptability. The LFPPO algorithm overcomes this by combining Lévy flight's chaotic exploration with Apache Kafka's real-time data streaming, an advancement absent in state-of-the-art methods. Tested in CARLA, the LFPPO algorithm achieved a 99% success rate compared to the PPO algorithm's 81%, demonstrating superior stability and rewards. These innovations enhance safety and RL exploration, with the LFPPO algorithm reducing collisions to 1% versus the PPO algorithm's 19%, advancing autonomous driving beyond existing techniques.
自动驾驶车辆必须做出快速准确的决策,才能在复杂多变的城市交通环境中高效运行,这就需要一种可靠且稳定的学习机制。近端策略优化(PPO)算法在强化学习(RL)方法中脱颖而出,因其学习过程一致,能确保在不同条件下做出稳定决策,同时避免执行过程中的突然偏差。然而,PPO算法在策略更新期间常常陷入有限的搜索空间,限制了其对环境变化的适应性以及对替代策略的探索。为克服这一局限性,我们将莱维飞行的混沌和全面探索能力集成到PPO算法中。我们的方法帮助该算法探索更大的解空间,并降低陷入局部最小值的风险。在本研究中,我们从CARLA模拟器收集了速度、加速度、交通标志位置、车辆位置、交通信号灯状态以及与周围物体的距离等实时数据,并通过Apache Kafka进行处理。这些数据由标准PPO算法和我们新颖的莱维飞行增强型PPO(LFPPO)算法进行分析。虽然PPO算法具有一致性,但其有限的探索阻碍了适应性。LFPPO算法通过将莱维飞行的混沌探索与Apache Kafka的实时数据流相结合克服了这一问题,这是现有最先进方法所没有的进步。在CARLA中进行测试时,LFPPO算法的成功率达到了99%,而PPO算法为81%,显示出卓越的稳定性和奖励。这些创新提高了安全性和RL探索能力,LFPPO算法将碰撞率降低到1%,而PPO算法为19%,推动自动驾驶超越现有技术。