This website requires JavaScript.

Reinforcement Learning for Omega-Regular Specifications on Continuous-Time MDP

Amin FalahShibashis GuhaAshutosh Trivedi
Mar 2023
摘要
Continuous-time Markov decision processes (CTMDPs) are canonical models toexpress sequential decision-making under dense-time and stochasticenvironments. When the stochastic evolution of the environment is onlyavailable via sampling, model-free reinforcement learning (RL) is thealgorithm-of-choice to compute optimal decision sequence. RL, on the otherhand, requires the learning objective to be encoded as scalar reward signals.Since doing such translations manually is both tedious and error-prone, anumber of techniques have been proposed to translate high-level objectives(expressed in logic or automata formalism) to scalar rewards for discrete-timeMarkov decision processes (MDPs). Unfortunately, no automatic translationexists for CTMDPs. We consider CTMDP environments against the learning objectives expressed asomega-regular languages. Omega-regular languages generalize regular languagesto infinite-horizon specifications and can express properties given in popularlinear-time logic LTL. To accommodate the dense-time nature of CTMDPs, weconsider two different semantics of omega-regular objectives: 1) satisfactionsemantics where the goal of the learner is to maximize the probability ofspending positive time in the good states, and 2) expectation semantics wherethe goal of the learner is to optimize the long-run expected average time spentin the ``good states" of the automaton. We present an approach enabling correcttranslation to scalar reward signals that can be readily used by off-the-shelfRL algorithms for CTMDPs. We demonstrate the effectiveness of the proposedalgorithms by evaluating it on some popular CTMDP benchmarks with omega-regularobjectives.
展开全部
图表提取

暂无人提供速读十问回答

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

Q1论文试图解决什么问题?
Q2这是否是一个新的问题?
Q3这篇文章要验证一个什么科学假设?
0
被引用
笔记
问答