simple_pendulum.reinforcement_learning.ddpg

Deep Determininstic Policy Gradient (DDPG) Training

Submodules

simple_pendulum.reinforcement_learning.ddpg.agent

Agent

class simple_pendulum.reinforcement_learning.ddpg.agent.Agent(state_shape, n_actions, action_limits, discount, actor_lr, critic_lr, actor_model, critic_model, target_actor_model, target_critic_model, tau=0.005)

Bases: object

get_action(state, noise_object=None)

load_model(path)

prep_state(state)

save_model(path)

scale_action(action, mini, maxi)

train_on(batch)

update_target_weights(tau=None)

simple_pendulum.reinforcement_learning.ddpg.ddpg

DDPG Trainer

class simple_pendulum.reinforcement_learning.ddpg.ddpg.ddpg_trainer(batch_size, validate_every=None, validation_reps=None, train_every_steps=inf)

Bases: object

init_agent(replay_buffer_size=50000, actor=None, critic=None, discount=0.99, actor_lr=0.0005, critic_lr=0.001, tau=0.005)

init_environment(dt=0.01, integrator='runge_kutta', max_steps=1000, reward_type='open_ai_gym', state_representation=2, validation_limit=-150, target=[3.141592653589793, 0.0], state_target_epsilon=[0.01, 0.01], scale_action=True, random_init='everywhere'): Initialize the training environment. This includes the simulation parameters of the pendulum.

init_pendulum(mass=0.57288, length=0.5, inertia=None, damping=0.15, coulomb_friction=0.0, gravity=9.81, torque_limit=2.0)

Initialize the pendulum parameters.

Parameters:

massfloat, default=0.57288: mass of the pendulum [kg]
lengthfloat, default=0.5: length of the pendulum [m]
inertiafloat, default=None: inertia of the pendulum [kg m^2] defaults to point mass inertia (mass*length^2)
dampingfloat, default=0.15: damping factor of the pendulum [kg m/s]
coulomb_frictionfloat, default=0.0: coulomb friciton of the pendulum [Nm]
gravityfloat, default=9.81: gravity (positive direction points down) [m/s^2]
torque_limitfloat, default=2.0: the torque_limit of the pendulum actuator

load(path)

save(path)

train(n_episodes, verbose=True)

simple_pendulum.reinforcement_learning.ddpg.models

Models

simple_pendulum.reinforcement_learning.ddpg.models.get_actor(state_shape, upper_bound=2.0, verbose=False)

simple_pendulum.reinforcement_learning.ddpg.models.get_critic(state_shape, n_actions, verbose=False)

simple_pendulum.reinforcement_learning.ddpg.noise

Noise

class simple_pendulum.reinforcement_learning.ddpg.noise.OUActionNoise(mean, std_deviation, theta=0.15, dt=0.01, x_initial=None)

Bases: object

reset()

simple_pendulum.reinforcement_learning.ddpg.replay_buffer

Replay Buffer

class simple_pendulum.reinforcement_learning.ddpg.replay_buffer.ReplayBuffer(max_size, num_states, num_actions)

Bases: object

Replay buffer class to store experiences for a reinforcement learning agent.

append(obs_tuple)

Add an experience to the replay buffer. When adding experiences beyond the max_size limit, the first entry is deleted. An observation consists of (state, action, next_state, reward, done)

Parameters:

obs_tuple: array-like: an observation (s,a,s’,r,d) to store in the buffer

clear(): Clear the Replay Buffer.

sample_batch(batch_size)

Sample a batch from the replay buffer.

Parameters:

batch_size: int: number of samples in the returned batch

Returns:

tuple: (s_batch,a_batch,s’_batch,r_batch,d_batch) a tuple of batches of state, action, reward, next_state, done