simple_pendulum.reinforcement_learning.ddpg

Deep Determininstic Policy Gradient (DDPG) Training

Submodules

simple_pendulum.reinforcement_learning.ddpg.agent

Agent

class simple_pendulum.reinforcement_learning.ddpg.agent.Agent(state_shape, n_actions, action_limits, discount, actor_lr, critic_lr, actor_model, critic_model, target_actor_model, target_critic_model, tau=0.005)

Bases: object

get_action(state, noise_object=None)
load_model(path)
prep_state(state)
save_model(path)
scale_action(action, mini, maxi)
train_on(batch)
update_target_weights(tau=None)

simple_pendulum.reinforcement_learning.ddpg.ddpg

DDPG Trainer

class simple_pendulum.reinforcement_learning.ddpg.ddpg.ddpg_trainer(batch_size, validate_every=None, validation_reps=None, train_every_steps=inf)

Bases: object

init_agent(replay_buffer_size=50000, actor=None, critic=None, discount=0.99, actor_lr=0.0005, critic_lr=0.001, tau=0.005)
init_environment(dt=0.01, integrator='runge_kutta', max_steps=1000, reward_type='open_ai_gym', state_representation=2, validation_limit=-150, target=[3.141592653589793, 0.0], state_target_epsilon=[0.01, 0.01], scale_action=True, random_init='everywhere')

Initialize the training environment. This includes the simulation parameters of the pendulum.

init_pendulum(mass=0.57288, length=0.5, inertia=None, damping=0.15, coulomb_friction=0.0, gravity=9.81, torque_limit=2.0)

Initialize the pendulum parameters.

Parameters:
massfloat, default=0.57288

mass of the pendulum [kg]

lengthfloat, default=0.5

length of the pendulum [m]

inertiafloat, default=None

inertia of the pendulum [kg m^2] defaults to point mass inertia (mass*length^2)

dampingfloat, default=0.15

damping factor of the pendulum [kg m/s]

coulomb_frictionfloat, default=0.0

coulomb friciton of the pendulum [Nm]

gravityfloat, default=9.81

gravity (positive direction points down) [m/s^2]

torque_limitfloat, default=2.0

the torque_limit of the pendulum actuator

load(path)
save(path)
train(n_episodes, verbose=True)

simple_pendulum.reinforcement_learning.ddpg.models

Models

simple_pendulum.reinforcement_learning.ddpg.models.get_actor(state_shape, upper_bound=2.0, verbose=False)
simple_pendulum.reinforcement_learning.ddpg.models.get_critic(state_shape, n_actions, verbose=False)

simple_pendulum.reinforcement_learning.ddpg.noise

Noise

class simple_pendulum.reinforcement_learning.ddpg.noise.OUActionNoise(mean, std_deviation, theta=0.15, dt=0.01, x_initial=None)

Bases: object

reset()

simple_pendulum.reinforcement_learning.ddpg.replay_buffer

Replay Buffer

class simple_pendulum.reinforcement_learning.ddpg.replay_buffer.ReplayBuffer(max_size, num_states, num_actions)

Bases: object

Replay buffer class to store experiences for a reinforcement learning agent.

append(obs_tuple)

Add an experience to the replay buffer. When adding experiences beyond the max_size limit, the first entry is deleted. An observation consists of (state, action, next_state, reward, done)

Parameters:
obs_tuple: array-like

an observation (s,a,s’,r,d) to store in the buffer

clear()

Clear the Replay Buffer.

sample_batch(batch_size)

Sample a batch from the replay buffer.

Parameters:
batch_size: int

number of samples in the returned batch

Returns:
tuple

(s_batch,a_batch,s’_batch,r_batch,d_batch) a tuple of batches of state, action, reward, next_state, done