Suppr超能文献

一种用于在线出租车调度的集成强化学习与集中式编程方法

An Integrated Reinforcement Learning and Centralized Programming Approach for Online Taxi Dispatching.

作者信息

Liang Enming, Wen Kexin, Lam William H K, Sumalee Agachai, Zhong Renxin

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Sep;33(9):4742-4756. doi: 10.1109/TNNLS.2021.3060187. Epub 2022 Aug 31.

Abstract

Balancing the supply and demand for ride-sourcing companies is a challenging issue, especially with real-time requests and stochastic traffic conditions of large-scale congested road networks. To tackle this challenge, this article proposes a robust and scalable approach that integrates reinforcement learning (RL) and a centralized programming (CP) structure to promote real-time taxi operations. Both real-time order matching decisions and vehicle relocation decisions at the microscopic network scale are integrated within a Markov decision process framework. The RL component learns the decomposed state-value function, which represents the taxi drivers' experience, the off-line historical demand pattern, and the traffic network congestion. The CP component plans nonmyopic decisions for drivers collectively under the prescribed system constraints to explicitly realize cooperation. Furthermore, to circumvent sparse reward and sample imbalance problems over the microscopic road network, this article proposed a temporal-difference learning algorithm with prioritized gradient descent and adaptive exploration techniques. A simulator is built and trained with the Manhattan road network and New York City yellow taxi data to simulate the real-time vehicle dispatching environment. Both centralized and decentralized taxi dispatching policies are examined with the simulator. This case study shows that the proposed approach can further improve taxi drivers' profits while reducing customers' waiting times compared to several existing vehicle dispatching algorithms.

摘要

平衡叫车公司的供需是一个具有挑战性的问题,尤其是在大规模拥堵道路网络的实时请求和随机交通状况下。为应对这一挑战,本文提出了一种强大且可扩展的方法,该方法将强化学习(RL)和集中式规划(CP)结构相结合,以促进实时出租车运营。微观网络层面的实时订单匹配决策和车辆重新定位决策都被整合到一个马尔可夫决策过程框架内。强化学习组件学习分解后的状态值函数,该函数代表出租车司机的经验、离线历史需求模式以及交通网络拥堵情况。集中式规划组件在规定的系统约束下为司机集体规划非近视决策,以明确实现合作。此外,为规避微观道路网络上的稀疏奖励和样本不平衡问题,本文提出了一种带有优先梯度下降和自适应探索技术的时间差分学习算法。利用曼哈顿道路网络和纽约市黄色出租车数据构建并训练了一个模拟器,以模拟实时车辆调度环境。通过该模拟器对集中式和分散式出租车调度策略进行了检验。该案例研究表明,与几种现有的车辆调度算法相比,所提出的方法在减少客户等待时间的同时,还能进一步提高出租车司机的利润。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验