simple_pendulum.reinforcement_learning.ddpg
Deep Determininstic Policy Gradient (DDPG) Training
Submodules
simple_pendulum.reinforcement_learning.ddpg.agent
Agent
- class simple_pendulum.reinforcement_learning.ddpg.agent.Agent(state_shape, n_actions, action_limits, discount, actor_lr, critic_lr, actor_model, critic_model, target_actor_model, target_critic_model, tau=0.005)
Bases:
object
- get_action(state, noise_object=None)
- load_model(path)
- prep_state(state)
- save_model(path)
- scale_action(action, mini, maxi)
- train_on(batch)
- update_target_weights(tau=None)
simple_pendulum.reinforcement_learning.ddpg.ddpg
DDPG Trainer
- class simple_pendulum.reinforcement_learning.ddpg.ddpg.ddpg_trainer(batch_size, validate_every=None, validation_reps=None, train_every_steps=inf)
Bases:
object
- init_agent(replay_buffer_size=50000, actor=None, critic=None, discount=0.99, actor_lr=0.0005, critic_lr=0.001, tau=0.005)
- init_environment(dt=0.01, integrator='runge_kutta', max_steps=1000, reward_type='open_ai_gym', state_representation=2, validation_limit=-150, target=[3.141592653589793, 0.0], state_target_epsilon=[0.01, 0.01], scale_action=True, random_init='everywhere')
Initialize the training environment. This includes the simulation parameters of the pendulum.
- init_pendulum(mass=0.57288, length=0.5, inertia=None, damping=0.15, coulomb_friction=0.0, gravity=9.81, torque_limit=2.0)
Initialize the pendulum parameters.
- Parameters:
- massfloat, default=0.57288
mass of the pendulum [kg]
- lengthfloat, default=0.5
length of the pendulum [m]
- inertiafloat, default=None
inertia of the pendulum [kg m^2] defaults to point mass inertia (mass*length^2)
- dampingfloat, default=0.15
damping factor of the pendulum [kg m/s]
- coulomb_frictionfloat, default=0.0
coulomb friciton of the pendulum [Nm]
- gravityfloat, default=9.81
gravity (positive direction points down) [m/s^2]
- torque_limitfloat, default=2.0
the torque_limit of the pendulum actuator
- load(path)
- save(path)
- train(n_episodes, verbose=True)
simple_pendulum.reinforcement_learning.ddpg.models
Models
- simple_pendulum.reinforcement_learning.ddpg.models.get_actor(state_shape, upper_bound=2.0, verbose=False)
- simple_pendulum.reinforcement_learning.ddpg.models.get_critic(state_shape, n_actions, verbose=False)
simple_pendulum.reinforcement_learning.ddpg.noise
Noise
simple_pendulum.reinforcement_learning.ddpg.replay_buffer
Replay Buffer
- class simple_pendulum.reinforcement_learning.ddpg.replay_buffer.ReplayBuffer(max_size, num_states, num_actions)
Bases:
object
Replay buffer class to store experiences for a reinforcement learning agent.
- append(obs_tuple)
Add an experience to the replay buffer. When adding experiences beyond the max_size limit, the first entry is deleted. An observation consists of (state, action, next_state, reward, done)
- Parameters:
- obs_tuple: array-like
an observation (s,a,s’,r,d) to store in the buffer
- clear()
Clear the Replay Buffer.
- sample_batch(batch_size)
Sample a batch from the replay buffer.
- Parameters:
- batch_size: int
number of samples in the returned batch
- Returns:
- tuple
(s_batch,a_batch,s’_batch,r_batch,d_batch) a tuple of batches of state, action, reward, next_state, done