一种基于学习的针对线性时态逻辑获胜条件的奖励异步概率博弈合成方法。

A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition.

作者信息

Zhao Wei, Liu Zhiming

机构信息

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, China.

School of Software, Northwestern Polytechnical University, Xi'an, Shaanxi, China.

出版信息

PeerJ Comput Sci. 2022 Sep 5;8:e1094. doi: 10.7717/peerj-cs.1094. eCollection 2022.

DOI:10.7717/peerj-cs.1094

PMID:36091983

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9455281/

Abstract

The traditional synthesis problem is usually solved by constructing a system that fulfills given specifications. The system is constantly interacting with the environment and is opposed to the environment. The problem can be further regarded as solving a two-player game (the system and its environment). Meanwhile, stochastic games are often used to model reactive processes. With the development of the intelligent industry, these theories are extensively used in robot patrolling, intelligent logistics, and intelligent transportation. However, it is still challenging to find a practically feasible synthesis algorithm and generate the optimal system according to the existing research. Thus, it is desirable to design an incentive mechanism to motivate the system to fulfill given specifications. This work studies the learning-based approach for strategy synthesis of reward asynchronous probabilistic games against linear temporal logic (LTL) specifications in a probabilistic environment. An asynchronous reward mechanism is proposed to motivate players to gain maximized rewards by their positions and choose actions. Based on this mechanism, the techniques of the learning theory can be applied to transform the synthesis problem into the problem of computing the expected rewards. Then, it is proven that the reinforcement learning algorithm provides the optimal strategies that maximize the expected cumulative reward of the satisfaction of an LTL specification asymptotically. Finally, our techniques are implemented, and their effectiveness is illustrated by two case studies of robot patrolling and autonomous driving.

摘要

传统的综合问题通常通过构建一个满足给定规格的系统来解决。该系统不断与环境交互并与环境相对。这个问题可以进一步看作是解决一个双人博弈（系统及其环境）。同时，随机博弈常被用于对反应过程进行建模。随着智能产业的发展，这些理论在机器人巡逻、智能物流和智能交通中得到了广泛应用。然而，根据现有研究，找到一种实际可行的综合算法并生成最优系统仍然具有挑战性。因此，期望设计一种激励机制来促使系统满足给定规格。这项工作研究了在概率环境中针对线性时态逻辑（LTL）规格的奖励异步概率博弈的基于学习的策略综合方法。提出了一种异步奖励机制，以激励玩家根据其位置获得最大化奖励并选择行动。基于此机制，学习理论的技术可应用于将综合问题转化为计算期望奖励的问题。然后，证明了强化学习算法渐近地提供使LTL规格满意度的期望累积奖励最大化的最优策略。最后，实现了我们的技术，并通过机器人巡逻和自动驾驶的两个案例研究说明了它们的有效性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种基于学习的针对线性时态逻辑获胜条件的奖励异步概率博弈合成方法。

A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

一种基于学习的针对线性时态逻辑获胜条件的奖励异步概率博弈合成方法。

A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition.

作者信息

机构信息

出版信息

相似文献

本文引用的文献