Deriving Rewards for Reinforcement Learning from Symbolic Behaviour Descriptions of Bipedal Walking

Daniel Harnack ¹ Christoph Lüth ^{1 2} Lukas Gross ¹ Shivesh Kumar ¹ Frank Kirchner ^{1 2}

German Research Center for Artificial Intelligence ¹
University of Bremen ²

Abstract

Generating physical movement behaviours from their symbolic description is a long-standing challenge in artificial intelligence (AI) and robotics, requiring insights into numerical optimization methods as well as into formalizations from symbolic AI and reasoning. In this paper, a novel approach to finding a reward function from a symbolic description is proposed. The intended system behaviour is modelled as a hybrid automaton, which reduces the system state space to allow more efficient reinforcement learning. The approach is applied to bipedal walking, by modelling the walking robot as a hybrid automaton over state space orthants, and used with the compass walker to derive a reward that incentivizes following the hybrid automaton cycle. As a result, training times of reinforcement learning controllers are reduced while final walking speed is increased. The approach can serve as a blueprint how to generate reward functions from symbolic AI and reasoning.

Presentation

Video

Code

The source code for the work described in the paper can be found here.

Citation

@inproceedings{HLGKK23,
  author = { Harnack,  Daniel and
             L\"{u}th, Christoph and
             Gross, Lukas and
             Kumar, Shivesh and 
             Kirchner, Frank}
  title =  { Deriving Rewards for Reinforcement Learning from Symbolic Behaviour Descriptions of Bipedal Walking},
  booktitle = {62nd {IEEE} Conference on Decision and Control ({CDC})},
  address   = {Marina Bay Sands, Singapore} 
  pages     = {2135 -- 2140},
  year      = {2023},
  publisher = {{IEEE}}
}