Sony AI, New York, NY, USA.
Sony AI, Tokyo, Japan.
Nature. 2022 Feb;602(7896):223-228. doi: 10.1038/s41586-021-04357-7. Epub 2022 Feb 9.
Many potential applications of artificial intelligence involve making real-time decisions in physical systems while interacting with humans. Automobile racing represents an extreme example of these conditions; drivers must execute complex tactical manoeuvres to pass or block opponents while operating their vehicles at their traction limits. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the non-linear control challenges of real race cars while also encapsulating the complex multi-agent interactions. Here we describe how we trained agents for Gran Turismo that can compete with the world's best e-sports drivers. We combine state-of-the-art, model-free, deep reinforcement learning algorithms with mixed-scenario training to learn an integrated control policy that combines exceptional speed with impressive tactics. In addition, we construct a reward function that enables the agent to be competitive while adhering to racing's important, but under-specified, sportsmanship rules. We demonstrate the capabilities of our agent, Gran Turismo Sophy, by winning a head-to-head competition against four of the world's best Gran Turismo drivers. By describing how we trained championship-level racers, we demonstrate the possibilities and challenges of using these techniques to control complex dynamical systems in domains where agents must respect imprecisely defined human norms.
许多人工智能的潜在应用都涉及在与人类交互的同时在物理系统中实时做出决策。赛车代表了这些条件的极端例子;驾驶员必须在操作车辆达到其牵引力极限的同时,执行复杂的战术机动来超越或阻挡对手。赛车模拟游戏,如 PlayStation 游戏《Gran Turismo》,忠实地再现了真实赛车的非线性控制挑战,同时还包含了复杂的多代理交互。在这里,我们描述了如何为 Gran Turismo 训练可以与世界上最好的电子竞技驾驶员竞争的代理。我们将最先进的无模型深度强化学习算法与混合场景训练相结合,以学习一种集成的控制策略,将卓越的速度与令人印象深刻的战术相结合。此外,我们构建了一个奖励函数,使代理能够在遵守赛车重要但规定不明确的体育道德规则的同时具有竞争力。我们通过与世界上最好的四名 Gran Turismo 驾驶员进行一对一的比赛,展示了我们的代理 Gran Turismo Sophy 的能力。通过描述我们如何训练冠军级赛车手,我们展示了在代理必须尊重定义不精确的人类规范的领域中使用这些技术控制复杂动力系统的可能性和挑战。