simple_pendulum.simulation

`Simulation`

Submodules

simple_pendulum.simulation.gym_environment

Gym Environment

class simple_pendulum.simulation.gym_environment.SimplePendulumEnv(simulator, max_steps=5000, target=[3.141592653589793, 0.0], state_target_epsilon=[0.01, 0.01], reward_type='continuous', dt=0.001, integrator='runge_kutta', state_representation=2, validation_limit=-150, scale_action=True, random_init='False')

Bases: Env

An environment for reinforcement learning

check_final_condition()

Checks whether a terminating condition has been met. The only terminating condition for the pendulum is if the maximum number of steps has been reached.

Returns:

donebool: whether a terminating condition has been met

close()

Override close in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when garbage collected or when the program exits.

get_observation(state)

Transform the state from the simulator an observation by wrapping the position to the observation space. If state_representation==3 also transforms the state to the trigonometric value form.

Parameters:

statearray-like: state as output by the simulator

Returns:

observationarray-like: observation in environment format

get_state_from_observation(obs)

Transform the observation to a pendulum state. Does nothing for state_representation==2. If state_representation==3 transforms trigonometric form back to regular form.

Parameters:

obsarray-like: observation as received from get_observation

Returns:

statearray-like: state in simulator form

is_goal(obs)

Checks whether an observation is in the goal region. The goal region is specified by the target and state_target_epsilon parameters in the class initialization.

Parameters:

obsarray-like: observation as received from get_observation

Returns:

goalbool: whether to observation is in the goal region

render(mode='human')

Renders the environment.

The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:

human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

Note

Make sure that your class’s metadata ‘render.modes’ key includes: the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.

Parameters:: mode (str) – the mode to render with

Example:

class MyEnv(Env):

metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}

def render(self, mode=’human’):

if mode == ‘rgb_array’:: return np.array(…) # return RGB frame suitable for video
elif mode == ‘human’:: … # pop up a window and render
else:: super(MyEnv, self).render(mode=mode) # just raise an exception

reset(state=None, random_init='start_vicinity')

Reset the environment. The pendulum is initialized with a random state in the vicinity of the stable fixpoint (position and velocity are in the range[-0.31, 0.31])

Parameters:

statearray-like, default=None

the state to which the environment is reset if state==None it defaults to the random initialisation

random_initstring, default=None

A string determining the random state initialisation if None, defaults to self.random_init “False” : The pendulum is set to [0, 0], “start_vicinity” : The pendulum position and velocity

are set in the range [-0.31, -0.31],

“everywhere”The pendulum is set to a random state in the whole: possible state space

Returns:

observationarray-like: the state the pendulum has been initilized to

step(action)

Take a step in the environment.

Parameters:

actionfloat: the torque that is applied to the pendulum

Returns:

observationarray-like: the observation from the environment after the step
rewardfloat: the reward received on this step
donebool: whether the episode has terminated
infodictionary: may contain additional information (empty at the moment)

swingup_reward(observation, action)

Calculate the reward for the pendulum for swinging up to the instable fixpoint. The reward function is selected based on the reward type defined during the object inizialization.

Parameters:

statearray-like: the observation that has been received from the environment

Returns:

rewardfloat: the reward for swinging up

Raises:

NotImplementedError: when the requested reward_type is not implemented

validation_criterion(validation_rewards, final_obs=None, criterion=None)

Checks whether a list of rewards and optionally final observations fulfill the validation criterion. The validation criterion is fulfilled if the mean of the validation_rewards id greater than criterion. If final obs is also given, at least 90% of the observations have to be in the goal region.

Parameters:

validation_rewardsarray-like: A list of rewards (floats).
final_obsarray-like, default=None: A list of final observations. If None final observations are not considered.
criterion: float, default=None: The reward limit which has to be surpassed.

Returns:

passedbool: Whether the rewards pass the validation test

simple_pendulum.simulation.simulation

Simulator

class simple_pendulum.simulation.simulation.Simulator(plant)

Bases: object

euler_integrator(t, y, tau)

Euler integrator for the simulated plant

Parameters:

tfloat: time, unit: s
y: type as self.plant expects a state: state of the pendulum
tau: type as self.plant expects an actuation: torque input

Returns:

array-likethe Euler integrand

get_state()

Get current state of the plant

Returns:

self.tfloat,: time, unit: s
self.xtype as self.plant expects a state: plant state

record_data(time, x, tau)

Records data in the internal data recorder

Parameters:

timefloat: time to be recorded, unit: s
xtype as self.plant expects a state: state to be recorded, units: rad, rad/s
tautype as self.plant expects an actuation: torque to be recorded, unit: Nm

reset_data_recorder(): Reset the internal data recorder of the simulator

runge_integrator(t, y, dt, tau)

Runge-Kutta integrator for the simulated plant

Parameters:

tfloat: time, unit: s
y: type as self.plant expects a state: state of the pendulum
dt: float: time step, unit: s
tau: type as self.plant expects an actuation: torque input

Returns:

array-likethe Runge-Kutta integrand

set_state(time, x)

set the state of the pendulum plant

Parameters:

time: float: time, unit: s
x: type as self.plant expects a state,: state of the pendulum plant

simulate(t0, x0, tf, dt, controller=None, integrator='runge_kutta')

Simulates the plant over a period of time.

Parameters:

t0: float: start time, unit s
x0: type as self.plant expects a state: start state
tf: float: final time, unit: s
controller: A controller object of the type of the: AbstractController in simple_pendulum.controllers.abstract_controller.py If None, a free pendulum is simulated.
integrator: string: “euler” for euler integrator, “runge_kutta” for Runge-Kutta integrator

Returns:

self.t_valueslist: a list of time values
self.x_valueslist: a list of states
self.tau_valueslist: a list of torques

simulate_and_animate(t0, x0, tf, dt, controller=None, integrator='runge_kutta', phase_plot=False, save_video=False, video_name='video')

Simulation and animation of the plant motion. The animation is only implemented for 2d serial chains. input: Simulates the plant over a period of time.

Parameters:

t0: float: start time, unit s
x0: type as self.plant expects a state: start state
tf: float: final time, unit: s
controller: A controller object of the type of the: AbstractController in simple_pendulum.controllers.abstract_controller.py If None, a free pendulum is simulated.
integrator: string: “euler” for euler integrator, “runge_kutta” for Runge-Kutta integrator
phase_plot: bool: whether to show a plot of the phase space together with the animation
save_video: bool: whether to save the animation as mp4 video
video_name: string: if save_video, the name of the file where the video will be stored

Returns:

self.t_valueslist: a list of time values
self.x_valueslist: a list of states
self.tau_valueslist: a list of torques

step(tau, dt, integrator='runge_kutta')

Performs a single step of the plant.

Parameters:

tau: type as self.plant expects an actuation: torque input
dt: float: time step, unit: s
integrator: string: “euler” for euler integrator “runge_kutta” for Runge-Kutta integrator

simple_pendulum.simulation.simulation.get_arrow(radius, centX, centY, angle_, theta2_, color_='black')

simple_pendulum.simulation.simulation.set_arrow_properties(arc, head, tau, x, y)

simple_pendulum.simulation

Simulation

Submodules

simple_pendulum.simulation.gym_environment

Gym Environment

simple_pendulum.simulation.simulation

Simulator

`Simulation`