simple_pendulum.reinforcement_learning.sac
Soft Actor Critic (SAC) Training
Submodules
simple_pendulum.reinforcement_learning.sac.sac
SAC Trainer
- class simple_pendulum.reinforcement_learning.sac.sac.sac_trainer(log_dir='sac_training')
Bases:
object
Class to train a policy for pendulum swingup with the state actor critic (sac) method.
- init_agent(learning_rate=0.0003, warm_start=False, warm_start_path='', verbose=1)
Initilize the agent.
- Parameters:
- learning_ratefloat, default=0.0003
learning rate of the agent
- warm_startbool, default=False
whether to use a pretrained model as initial model for training
- warm_start_pathstring, default=””
path to the model to load for warm start if warm_start==True
- verboseint, default=1
enable/disable printing of training progression to terminal
- init_environment(dt=0.01, integrator='runge_kutta', max_steps=1000, reward_type='soft_binary_with_repellor', state_representation=2, validation_limit=-150, target=[3.141592653589793, 0.0], state_target_epsilon=[0.01, 0.01], random_init='everywhere')
Initialize the training environment. This includes the simulation parameters of the pendulum.
- init_pendulum(mass=0.57288, length=0.5, inertia=None, damping=0.15, coulomb_friction=0.0, gravity=9.81, torque_limit=2.0)
Initialize the pendulum parameters.
- Parameters:
- massfloat, default=0.57288
mass of the pendulum [kg]
- lengthfloat, default=0.5
length of the pendulum [m]
- inertiafloat, default=None
inertia of the pendulum [kg m^2] defaults to point mass inertia (mass*length^2)
- dampingfloat, default=0.15
damping factor of the pendulum [kg m/s]
- coulomb_frictionfloat, default=0.0
coulomb friciton of the pendulum [Nm]
- gravityfloat, default=9.81
gravity (positive direction points down) [m/s^2]
- torque_limitfloat, default=2.0
the torque_limit of the pendulum actuator
- train(training_timesteps=1000000.0, reward_threshold=1000.0, eval_frequency=10000, n_eval_episodes=20, verbose=1)
Train the agent and save the model. The model will be saved to os.path.join(self.logdir, “best_model”).