Pendubot Simulation Performance Leaderboard V2

Controller Short Controller Description Swingup Success Swingup Time [s] Energy [J] Torque Cost [N²m²] Torque Smoothness [Nm] Velocity Cost [m²/s²] RealAI Score Username Data
mcpilco Swingup trained with MBRL algorithm MC-PILCO + stabilization with LQR. 1/1 1.14 8.35 2.42 0.054 114.26 0.48 turcato-niccolo data plot video
History SAC SAC using custom model architecture to encode system dynamics. 1/1 1.15 7.67 2.36 0.007 62.07 0.682 tfaust data plot video
AR-EAPO Policy trained with average reward maximum entropy RL 1/1 1.15 7.72 2.43 0.01 64.33 0.659 rnilva data plot video
iLQR Riccati Gains Stabilization of iLQR trajectorry with Riccati gains. Top stabilizaion with LQR. 1/1 4.13 9.53 1.25 0.005 211.34 0.536 fwiebe data plot video
evolsac Evolutionary SAC for both swingup and stabilisation 1/1 0.71 9.83 4.37 0.014 58.15 0.596 AlbertoSinigaglia data plot video
TVLQR Stabilization of iLQR trajectory with time-varying LQR. 1/1 4.13 9.53 1.26 0.007 211.12 0.526 fwiebe data plot video
iLQR MPC stabilization Online optimization with iterative LQR. Stabilization of iLQR trajectory. Top stabilization with LQR. 1/1 4.12 9.91 1.77 0.083 211.98 0.353 fwiebe data plot video

Rules

The simulation leaderboard compares the performance of different control methods in simulation. The task for the controller is to swingup and balance the pendubot and keep the end-effector above the threshold line.

The model parameters of the pendubot are:

More information about the dynamic model of the double pendulum can be found here: Double Pendulum Dynamics. In the Double Pendulum Repository the parameters above are labeled as ‘designC.1/model1.1’. For a urdf file with this model see here: URDF.

The pendubot is simulated with a Runge-Kutta 4 integrator with a timestep of \(dt = 0.002 \, \text{s}\) for \(T = 10 \, \text{s}\). The initial pendubot configuration is \(x_0 = (0, 0, 0, 0)\) (hanging down) and the goal is the unstable fixpoint at the upright configuration \(x_g = (\pi, 0, 0, 0)\). The upright position is considered to be reached when the end-effector is above the threshold line at \(h=0.45 \, \text{m}\) (origin at the mounting point) and stays there until the end.

Scores

For the evaluation multiple criteria are evaluated and weighted to calculate an overall score (RealAI Score). The criteria are:

These criteria are used to calculate the overall Real AI Score with the formula

\[ \begin{equation} S = c_{success} \left( 1 - \sum_{i \in \{ 'time', 'energy', '\tau, cost', '\tau, smooth', 'vel, cost' \}} \tanh \left(\frac{c_{i}}{n_{i}}\right)\right) \end{equation} \]

The normalizations coefficients are:

Criterion normalization \(n\)
Swingup Time 20.0
Energy 60.0
Torque Cost 20
Torque Smoothness 0.1
Velocity Cost 400.0

Participating

If you want to participate in this leaderboard with your own controller have a look at the leaderboard explanation in the double pendulum repository. The leaderboard is automatically periodically updated based on the controllers that have been contributed to that repository.