simple_pendulum.simulation
Simulation
Submodules
simple_pendulum.simulation.gym_environment
Gym Environment
- class simple_pendulum.simulation.gym_environment.SimplePendulumEnv(simulator, max_steps=5000, target=[3.141592653589793, 0.0], state_target_epsilon=[0.01, 0.01], reward_type='continuous', dt=0.001, integrator='runge_kutta', state_representation=2, validation_limit=-150, scale_action=True, random_init='False')
Bases:
Env
An environment for reinforcement learning
- check_final_condition()
Checks whether a terminating condition has been met. The only terminating condition for the pendulum is if the maximum number of steps has been reached.
- Returns:
- donebool
whether a terminating condition has been met
- close()
Override close in your subclass to perform any necessary cleanup.
Environments will automatically close() themselves when garbage collected or when the program exits.
- get_observation(state)
Transform the state from the simulator an observation by wrapping the position to the observation space. If state_representation==3 also transforms the state to the trigonometric value form.
- Parameters:
- statearray-like
state as output by the simulator
- Returns:
- observationarray-like
observation in environment format
- get_state_from_observation(obs)
Transform the observation to a pendulum state. Does nothing for state_representation==2. If state_representation==3 transforms trigonometric form back to regular form.
- Parameters:
- obsarray-like
observation as received from get_observation
- Returns:
- statearray-like
state in simulator form
- is_goal(obs)
Checks whether an observation is in the goal region. The goal region is specified by the target and state_target_epsilon parameters in the class initialization.
- Parameters:
- obsarray-like
observation as received from get_observation
- Returns:
- goalbool
whether to observation is in the goal region
- render(mode='human')
Renders the environment.
The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:
human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).
Note
- Make sure that your class’s metadata ‘render.modes’ key includes
the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.
- Parameters:
mode (str) – the mode to render with
Example:
- class MyEnv(Env):
metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}
- def render(self, mode=’human’):
- if mode == ‘rgb_array’:
return np.array(…) # return RGB frame suitable for video
- elif mode == ‘human’:
… # pop up a window and render
- else:
super(MyEnv, self).render(mode=mode) # just raise an exception
- reset(state=None, random_init='start_vicinity')
Reset the environment. The pendulum is initialized with a random state in the vicinity of the stable fixpoint (position and velocity are in the range[-0.31, 0.31])
- Parameters:
- statearray-like, default=None
the state to which the environment is reset if state==None it defaults to the random initialisation
- random_initstring, default=None
A string determining the random state initialisation if None, defaults to self.random_init “False” : The pendulum is set to [0, 0], “start_vicinity” : The pendulum position and velocity
are set in the range [-0.31, -0.31],
- “everywhere”The pendulum is set to a random state in the whole
possible state space
- Returns:
- observationarray-like
the state the pendulum has been initilized to
- step(action)
Take a step in the environment.
- Parameters:
- actionfloat
the torque that is applied to the pendulum
- Returns:
- observationarray-like
the observation from the environment after the step
- rewardfloat
the reward received on this step
- donebool
whether the episode has terminated
- infodictionary
may contain additional information (empty at the moment)
- swingup_reward(observation, action)
Calculate the reward for the pendulum for swinging up to the instable fixpoint. The reward function is selected based on the reward type defined during the object inizialization.
- Parameters:
- statearray-like
the observation that has been received from the environment
- Returns:
- rewardfloat
the reward for swinging up
- Raises:
- NotImplementedError
when the requested reward_type is not implemented
- validation_criterion(validation_rewards, final_obs=None, criterion=None)
Checks whether a list of rewards and optionally final observations fulfill the validation criterion. The validation criterion is fulfilled if the mean of the validation_rewards id greater than criterion. If final obs is also given, at least 90% of the observations have to be in the goal region.
- Parameters:
- validation_rewardsarray-like
A list of rewards (floats).
- final_obsarray-like, default=None
A list of final observations. If None final observations are not considered.
- criterion: float, default=None
The reward limit which has to be surpassed.
- Returns:
- passedbool
Whether the rewards pass the validation test
simple_pendulum.simulation.simulation
Simulator
- class simple_pendulum.simulation.simulation.Simulator(plant)
Bases:
object
- euler_integrator(t, y, tau)
Euler integrator for the simulated plant
- Parameters:
- tfloat
time, unit: s
- y: type as self.plant expects a state
state of the pendulum
- tau: type as self.plant expects an actuation
torque input
- Returns:
- array-likethe Euler integrand
- get_state()
Get current state of the plant
- Returns:
- self.tfloat,
time, unit: s
- self.xtype as self.plant expects a state
plant state
- record_data(time, x, tau)
Records data in the internal data recorder
- Parameters:
- timefloat
time to be recorded, unit: s
- xtype as self.plant expects a state
state to be recorded, units: rad, rad/s
- tautype as self.plant expects an actuation
torque to be recorded, unit: Nm
- reset_data_recorder()
Reset the internal data recorder of the simulator
- runge_integrator(t, y, dt, tau)
Runge-Kutta integrator for the simulated plant
- Parameters:
- tfloat
time, unit: s
- y: type as self.plant expects a state
state of the pendulum
- dt: float
time step, unit: s
- tau: type as self.plant expects an actuation
torque input
- Returns:
- array-likethe Runge-Kutta integrand
- set_state(time, x)
set the state of the pendulum plant
- Parameters:
- time: float
time, unit: s
- x: type as self.plant expects a state,
state of the pendulum plant
- simulate(t0, x0, tf, dt, controller=None, integrator='runge_kutta')
Simulates the plant over a period of time.
- Parameters:
- t0: float
start time, unit s
- x0: type as self.plant expects a state
start state
- tf: float
final time, unit: s
- controller: A controller object of the type of the
AbstractController in simple_pendulum.controllers.abstract_controller.py If None, a free pendulum is simulated.
- integrator: string
“euler” for euler integrator, “runge_kutta” for Runge-Kutta integrator
- Returns:
- self.t_valueslist
a list of time values
- self.x_valueslist
a list of states
- self.tau_valueslist
a list of torques
- simulate_and_animate(t0, x0, tf, dt, controller=None, integrator='runge_kutta', phase_plot=False, save_video=False, video_name='video')
Simulation and animation of the plant motion. The animation is only implemented for 2d serial chains. input: Simulates the plant over a period of time.
- Parameters:
- t0: float
start time, unit s
- x0: type as self.plant expects a state
start state
- tf: float
final time, unit: s
- controller: A controller object of the type of the
AbstractController in simple_pendulum.controllers.abstract_controller.py If None, a free pendulum is simulated.
- integrator: string
“euler” for euler integrator, “runge_kutta” for Runge-Kutta integrator
- phase_plot: bool
whether to show a plot of the phase space together with the animation
- save_video: bool
whether to save the animation as mp4 video
- video_name: string
if save_video, the name of the file where the video will be stored
- Returns:
- self.t_valueslist
a list of time values
- self.x_valueslist
a list of states
- self.tau_valueslist
a list of torques
- step(tau, dt, integrator='runge_kutta')
Performs a single step of the plant.
- Parameters:
- tau: type as self.plant expects an actuation
torque input
- dt: float
time step, unit: s
- integrator: string
“euler” for euler integrator “runge_kutta” for Runge-Kutta integrator
- simple_pendulum.simulation.simulation.get_arrow(radius, centX, centY, angle_, theta2_, color_='black')
- simple_pendulum.simulation.simulation.set_arrow_properties(arc, head, tau, x, y)