Simulator
The simulator class can simulate and animate the pendulum motion forward in time. The gym environment can be used for reinforcement learning.
API
The simulator
The simulator should by initialized with a plant (here the PendulumPlant
) as follows:
pendulum = PendulumPlant()
sim = Simulator(plant=pendulum)
To simulate the dynamics of the plant forward in time call:
T, X, TAU = sim.simulate(t0=0.0,
x0=[0.5, 0.0],
tf=10.0,
dt=0.01,
controller=None,
integrator="runge_kutta")
The inputs of the function are:
t0: float, start time, unit: s
x0: start state (dimension as the plant expects it)
tf: float, final time, unit: s
dt: float, time step, unit: s
controller: controller that computes the motor torque(s) to be applied. The controller should have the structure of the AbstractController class in utilities/abstract_controller. If controller=None, no controller is used and the free system is simulated.
integrator: string,
euler
for euler integrator,runge_kutta
for Runge-Kutta integrator
The function returns three lists:
T: List of time values
X: List of states
TAU: List of actuations
The same simulation can be executed together with an animation of the plant (only implemented for 2d serial chains). For the simuation with animation call:
T, X, TAU = sim.simulate_and_animate(t0=0.0,
x0=[0.5, 0.0],
tf=10.0,
dt=0.01,
controller=None,
integrator="runge_kutta",
phase_plot=True,
save_video=False,
video_name="")
The additional parameters are:
- phase plot:
bool
Whether to show a phase plot along the animation plot
- phase plot:
- save_video:
bool
Whether to save the animation as mp4 video
- save_video:
- video_name:
string
Name of the file where the video should be saved (only used if save_video=True)
- video_name:
The gym environment
The environment can be initialized with:
pendulum = PendulumPlant()
sim = Simulator(plant=pendulum)
env = SimplePendulumEnv(simulator=sim,
max_steps=5000,
target=[np.pi, 0.0],
state_target_epsilon=[1e-2, 1e-2],
reward_type='continuous',
dt=1e-3,
integrator='runge_kutta',
state_representation=2,
validation_limit=-150,
scale_action=True,
random_init="False")
The parameters are:
- simulator:
Simulator object
- max_steps:
int
, default=``5000`` Maximum steps the agent can take before the episode is terminated
- max_steps:
- target:
array-like
, default=``[np.pi, 0.0]`` The target state of the pendulum
- target:
- state_target_epsilon:
array-like
, default=``[1e-2, 1e-2]`` Target epsilon for discrete reward type
- state_target_epsilon:
- reward_type:
string
, default=``continuous`` The reward type selects the reward function which is used Options:
continuous
,discrete
,soft_binary
,soft_binary_with_repellor
- reward_type:
- dt:
float
, default=``1e-3`` Timestep for the simulation
- dt:
- integrator:
string
, default=’runge_kutta’ The integrator which is used by the simulator Options:
euler
,runge_kutta
- integrator:
- state_representation:
int
, default=``2`` Determines how the state space of the pendulum is represented 2 means state =
[position, velocity]
3 means state =[cos(position), sin(position), velocity]
- state_representation:
- validation_limit:
float
, default=-150 If the reward during validation episodes surpasses this value the training stops early
- validation_limit:
- scale_action: bool, default=True
Whether to scale the output of the model with the torque limit of the simulator’s plant. If True the model is expected so return values in the intervall [-1, 1] as action.
- random_init:
string
, default=``False`` A string determining the random state initialisation
False
: The pendulum is set to [0, 0],start_vicinity
: The pendulum position and velocity are set in the range [-0.31, -0.31],everywhere
: The pendulum is set to a random state in the wholepossible state space
- random_init:
Usage
For examples of usages of the simulator class check out the scripts in the examples folder.
The gym environment is used for example in the ddpg training.