simple_pendulum.simulation

Simulation

Submodules

simple_pendulum.simulation.gym_environment

Gym Environment

class simple_pendulum.simulation.gym_environment.SimplePendulumEnv(simulator, max_steps=5000, target=[3.141592653589793, 0.0], state_target_epsilon=[0.01, 0.01], reward_type='continuous', dt=0.001, integrator='runge_kutta', state_representation=2, validation_limit=-150, scale_action=True, random_init='False')

Bases: Env

An environment for reinforcement learning

check_final_condition()

Checks whether a terminating condition has been met. The only terminating condition for the pendulum is if the maximum number of steps has been reached.

Returns:
donebool

whether a terminating condition has been met

close()

Override close in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when garbage collected or when the program exits.

get_observation(state)

Transform the state from the simulator an observation by wrapping the position to the observation space. If state_representation==3 also transforms the state to the trigonometric value form.

Parameters:
statearray-like

state as output by the simulator

Returns:
observationarray-like

observation in environment format

get_state_from_observation(obs)

Transform the observation to a pendulum state. Does nothing for state_representation==2. If state_representation==3 transforms trigonometric form back to regular form.

Parameters:
obsarray-like

observation as received from get_observation

Returns:
statearray-like

state in simulator form

is_goal(obs)

Checks whether an observation is in the goal region. The goal region is specified by the target and state_target_epsilon parameters in the class initialization.

Parameters:
obsarray-like

observation as received from get_observation

Returns:
goalbool

whether to observation is in the goal region

render(mode='human')

Renders the environment.

The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:

  • human: render to the current display or terminal and return nothing. Usually for human consumption.

  • rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

  • ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

Note

Make sure that your class’s metadata ‘render.modes’ key includes

the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.

Parameters:

mode (str) – the mode to render with

Example:

class MyEnv(Env):

metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}

def render(self, mode=’human’):
if mode == ‘rgb_array’:

return np.array(…) # return RGB frame suitable for video

elif mode == ‘human’:

… # pop up a window and render

else:

super(MyEnv, self).render(mode=mode) # just raise an exception

reset(state=None, random_init='start_vicinity')

Reset the environment. The pendulum is initialized with a random state in the vicinity of the stable fixpoint (position and velocity are in the range[-0.31, 0.31])

Parameters:
statearray-like, default=None

the state to which the environment is reset if state==None it defaults to the random initialisation

random_initstring, default=None

A string determining the random state initialisation if None, defaults to self.random_init “False” : The pendulum is set to [0, 0], “start_vicinity” : The pendulum position and velocity

are set in the range [-0.31, -0.31],

“everywhere”The pendulum is set to a random state in the whole

possible state space

Returns:
observationarray-like

the state the pendulum has been initilized to

step(action)

Take a step in the environment.

Parameters:
actionfloat

the torque that is applied to the pendulum

Returns:
observationarray-like

the observation from the environment after the step

rewardfloat

the reward received on this step

donebool

whether the episode has terminated

infodictionary

may contain additional information (empty at the moment)

swingup_reward(observation, action)

Calculate the reward for the pendulum for swinging up to the instable fixpoint. The reward function is selected based on the reward type defined during the object inizialization.

Parameters:
statearray-like

the observation that has been received from the environment

Returns:
rewardfloat

the reward for swinging up

Raises:
NotImplementedError

when the requested reward_type is not implemented

validation_criterion(validation_rewards, final_obs=None, criterion=None)

Checks whether a list of rewards and optionally final observations fulfill the validation criterion. The validation criterion is fulfilled if the mean of the validation_rewards id greater than criterion. If final obs is also given, at least 90% of the observations have to be in the goal region.

Parameters:
validation_rewardsarray-like

A list of rewards (floats).

final_obsarray-like, default=None

A list of final observations. If None final observations are not considered.

criterion: float, default=None

The reward limit which has to be surpassed.

Returns:
passedbool

Whether the rewards pass the validation test

simple_pendulum.simulation.simulation

Simulator

class simple_pendulum.simulation.simulation.Simulator(plant)

Bases: object

euler_integrator(t, y, tau)

Euler integrator for the simulated plant

Parameters:
tfloat

time, unit: s

y: type as self.plant expects a state

state of the pendulum

tau: type as self.plant expects an actuation

torque input

Returns:
array-likethe Euler integrand
get_state()

Get current state of the plant

Returns:
self.tfloat,

time, unit: s

self.xtype as self.plant expects a state

plant state

record_data(time, x, tau)

Records data in the internal data recorder

Parameters:
timefloat

time to be recorded, unit: s

xtype as self.plant expects a state

state to be recorded, units: rad, rad/s

tautype as self.plant expects an actuation

torque to be recorded, unit: Nm

reset_data_recorder()

Reset the internal data recorder of the simulator

runge_integrator(t, y, dt, tau)

Runge-Kutta integrator for the simulated plant

Parameters:
tfloat

time, unit: s

y: type as self.plant expects a state

state of the pendulum

dt: float

time step, unit: s

tau: type as self.plant expects an actuation

torque input

Returns:
array-likethe Runge-Kutta integrand
set_state(time, x)

set the state of the pendulum plant

Parameters:
time: float

time, unit: s

x: type as self.plant expects a state,

state of the pendulum plant

simulate(t0, x0, tf, dt, controller=None, integrator='runge_kutta')

Simulates the plant over a period of time.

Parameters:
t0: float

start time, unit s

x0: type as self.plant expects a state

start state

tf: float

final time, unit: s

controller: A controller object of the type of the

AbstractController in simple_pendulum.controllers.abstract_controller.py If None, a free pendulum is simulated.

integrator: string

“euler” for euler integrator, “runge_kutta” for Runge-Kutta integrator

Returns:
self.t_valueslist

a list of time values

self.x_valueslist

a list of states

self.tau_valueslist

a list of torques

simulate_and_animate(t0, x0, tf, dt, controller=None, integrator='runge_kutta', phase_plot=False, save_video=False, video_name='video')

Simulation and animation of the plant motion. The animation is only implemented for 2d serial chains. input: Simulates the plant over a period of time.

Parameters:
t0: float

start time, unit s

x0: type as self.plant expects a state

start state

tf: float

final time, unit: s

controller: A controller object of the type of the

AbstractController in simple_pendulum.controllers.abstract_controller.py If None, a free pendulum is simulated.

integrator: string

“euler” for euler integrator, “runge_kutta” for Runge-Kutta integrator

phase_plot: bool

whether to show a plot of the phase space together with the animation

save_video: bool

whether to save the animation as mp4 video

video_name: string

if save_video, the name of the file where the video will be stored

Returns:
self.t_valueslist

a list of time values

self.x_valueslist

a list of states

self.tau_valueslist

a list of torques

step(tau, dt, integrator='runge_kutta')

Performs a single step of the plant.

Parameters:
tau: type as self.plant expects an actuation

torque input

dt: float

time step, unit: s

integrator: string

“euler” for euler integrator “runge_kutta” for Runge-Kutta integrator

simple_pendulum.simulation.simulation.get_arrow(radius, centX, centY, angle_, theta2_, color_='black')
simple_pendulum.simulation.simulation.set_arrow_properties(arc, head, tau, x, y)