simple_pendulum.reinforcement_learning.sac

Soft Actor Critic (SAC) Training

Submodules

simple_pendulum.reinforcement_learning.sac.sac

SAC Trainer

class simple_pendulum.reinforcement_learning.sac.sac.sac_trainer(log_dir='sac_training')

Bases: object

Class to train a policy for pendulum swingup with the state actor critic (sac) method.

init_agent(learning_rate=0.0003, warm_start=False, warm_start_path='', verbose=1)

Initilize the agent.

Parameters:
learning_ratefloat, default=0.0003

learning rate of the agent

warm_startbool, default=False

whether to use a pretrained model as initial model for training

warm_start_pathstring, default=””

path to the model to load for warm start if warm_start==True

verboseint, default=1

enable/disable printing of training progression to terminal

init_environment(dt=0.01, integrator='runge_kutta', max_steps=1000, reward_type='soft_binary_with_repellor', state_representation=2, validation_limit=-150, target=[3.141592653589793, 0.0], state_target_epsilon=[0.01, 0.01], random_init='everywhere')

Initialize the training environment. This includes the simulation parameters of the pendulum.

init_pendulum(mass=0.57288, length=0.5, inertia=None, damping=0.15, coulomb_friction=0.0, gravity=9.81, torque_limit=2.0)

Initialize the pendulum parameters.

Parameters:
massfloat, default=0.57288

mass of the pendulum [kg]

lengthfloat, default=0.5

length of the pendulum [m]

inertiafloat, default=None

inertia of the pendulum [kg m^2] defaults to point mass inertia (mass*length^2)

dampingfloat, default=0.15

damping factor of the pendulum [kg m/s]

coulomb_frictionfloat, default=0.0

coulomb friciton of the pendulum [Nm]

gravityfloat, default=9.81

gravity (positive direction points down) [m/s^2]

torque_limitfloat, default=2.0

the torque_limit of the pendulum actuator

train(training_timesteps=1000000.0, reward_threshold=1000.0, eval_frequency=10000, n_eval_episodes=20, verbose=1)

Train the agent and save the model. The model will be saved to os.path.join(self.logdir, “best_model”).