Simulator ========= The simulator class can simulate and animate the pendulum motion forward in time. The gym environment can be used for reinforcement learning. API --- The simulator ~~~~~~~~~~~~~ The simulator should by initialized with a plant (here the ``PendulumPlant``) as follows:: pendulum = PendulumPlant() sim = Simulator(plant=pendulum) To simulate the dynamics of the plant forward in time call:: T, X, TAU = sim.simulate(t0=0.0, x0=[0.5, 0.0], tf=10.0, dt=0.01, controller=None, integrator="runge_kutta") The inputs of the function are: * **t0**: float, start time, unit: s * **x0**: start state (dimension as the plant expects it) * **tf**: float, final time, unit: s * **dt**: float, time step, unit: s * **controller**: controller that computes the motor torque(s) to be applied. The controller should have the structure of the AbstractController class in utilities/abstract_controller. If controller=None, no controller is used and the free system is simulated. * **integrator**: string, ``euler`` for euler integrator, ``runge_kutta`` for Runge-Kutta integrator The function returns three lists: * **T**: List of time values * **X**: List of states * **TAU**: List of actuations The same simulation can be executed together with an animation of the plant (only implemented for 2d serial chains). For the simuation with animation call:: T, X, TAU = sim.simulate_and_animate(t0=0.0, x0=[0.5, 0.0], tf=10.0, dt=0.01, controller=None, integrator="runge_kutta", phase_plot=True, save_video=False, video_name="") The additional parameters are: * **phase plot**: ``bool`` Whether to show a phase plot along the animation plot * **save_video**: ``bool`` Whether to save the animation as mp4 video * **video_name**: ``string`` Name of the file where the video should be saved (only used if save_video=True) The gym environment ~~~~~~~~~~~~~~~~~~~ The environment can be initialized with:: pendulum = PendulumPlant() sim = Simulator(plant=pendulum) env = SimplePendulumEnv(simulator=sim, max_steps=5000, target=[np.pi, 0.0], state_target_epsilon=[1e-2, 1e-2], reward_type='continuous', dt=1e-3, integrator='runge_kutta', state_representation=2, validation_limit=-150, scale_action=True, random_init="False") The parameters are: * **simulator**: Simulator object * **max_steps**: ``int``, default=``5000`` Maximum steps the agent can take before the episode is terminated * **target**: ``array-like``, default=``[np.pi, 0.0]`` The target state of the pendulum * **state_target_epsilon**: ``array-like``, default=``[1e-2, 1e-2]`` Target epsilon for discrete reward type * **reward_type**: ``string``, default=``continuous`` The reward type selects the reward function which is used Options: ``continuous``, ``discrete``, ``soft_binary``, ``soft_binary_with_repellor`` * **dt**: ``float``, default=``1e-3`` Timestep for the simulation * **integrator**: ``string``, default='runge_kutta' The integrator which is used by the simulator Options: ``euler``, ``runge_kutta`` * **state_representation**: ``int``, default=``2`` Determines how the state space of the pendulum is represented 2 means state = ``[position, velocity]`` 3 means state = ``[cos(position), sin(position), velocity]`` * **validation_limit**: ``float``, default=-150 If the reward during validation episodes surpasses this value the training stops early * **scale_action**: bool, default=True Whether to scale the output of the model with the torque limit of the simulator's plant. If True the model is expected so return values in the intervall [-1, 1] as action. * **random_init**: ``string``, default=``False`` A string determining the random state initialisation ``False``: The pendulum is set to [0, 0], ``start_vicinity``: The pendulum position and velocity are set in the range [-0.31, -0.31], ``everywhere``: The pendulum is set to a random state in the whole possible state space Usage ----- For examples of usages of the simulator class check out the scripts in the `examples folder `_. The gym environment is used for example in the `ddpg training `_.