Controller | Short Controller Description | Swingup Success | Swingup Time [s] | Energy [J] | Torque Cost [N²m²] | Torque Smoothness [Nm] | Velocity Cost [m²/s²] | RealAI Score | Username | Data |
---|---|---|---|---|---|---|---|---|---|---|
mcpilco | Swingup trained with MBRL algorithm MC-PILCO + stabilization with LQR. | 1/1 | 1.45 | 19.43 | 3.22 | 0.097 | 253.59 | 0.316 | turcato-niccolo | data plot video |
History SAC | SAC using custom model architecture to encode system dynamics. | 1/1 | 1.0 | 8.08 | 1.8 | 0.01 | 88.5 | 0.655 | tfaust | data plot video |
TVLQR | Stabilization of iLQR trajectory with time-varying LQR. | 1/1 | 4.05 | 10.43 | 1.87 | 0.016 | 105.83 | 0.504 | fwiebe | data plot video |
AR-EAPO | Policy trained with average reward maximum entropy RL | 1/1 | 1.39 | 8.32 | 1.52 | 0.008 | 117.96 | 0.633 | rnilva | data plot video |
iLQR Riccati Gains | Stabilization of iLQR trajectory with Riccati gains. Top stabilization with LQR. | 1/1 | 4.04 | 10.55 | 1.98 | 0.067 | 106.49 | 0.396 | fwiebe | data plot video |
evolsac | Evolutionary SAC for both swingup and stabilisation | 1/1 | 0.96 | 9.26 | 2.71 | 0.03 | 96.56 | 0.524 | AlbertoSinigaglia | data plot video |
iLQR MPC stabilization | Online optimization with iterative LQR. Stabilization of iLQR trajectory. Top stabilization with LQR. | 1/1 | 4.86 | 11.54 | 2.68 | 0.096 | 110.4 | 0.345 | fwiebe | data plot video |
The simulation leaderboard compares the performance of different control methods in simulation. The task for the controller is to swingup and balance the acrobot and keep the end-effector above the threshold line.
The model parameters of the acrobot are:
More information about the dynamic model of the double pendulum can be found here: Double Pendulum Dynamics. In the Double Pendulum Repository the parameters above are labeled as ‘designC.1/model1.1’. For a urdf file with this model see here: URDF.
The acrobot is simulated with a Runge-Kutta 4 integrator with a timestep of \(dt = 0.002 \, \text{s}\) for \(T = 10 \, \text{s}\). The initial acrobot configuration is \(x_0 = (0, 0, 0, 0)\) (hanging down) and the goal is the unstable fixpoint at the upright configuration \(x_g = (\pi, 0, 0, 0)\). The upright position is considered to be reached when the end-effector is above the threshold line at \(h=0.45 \, \text{m}\) (origin at the mounting point) and stays there until the end.
For the evaluation multiple criteria are evaluated and weighted to calculate an overall score (RealAI Score). The criteria are:
These criteria are used to calculate the overall Real AI Score with the formula
\[ \begin{equation} S = c_{success} \left( 1 - \sum_{i \in \{ 'time', 'energy', '\tau, cost', '\tau, smooth', 'vel, cost' \}} \tanh \left(\frac{c_{i}}{n_{i}}\right)\right) \end{equation} \]
The normalizations coefficients are:
Criterion | normalization \(n\) |
---|---|
Swingup Time | 20.0 |
Energy | 60.0 |
Torque Cost | 20 |
Torque Smoothness | 0.1 |
Velocity Cost | 400.0 |
If you want to participate in this leaderboard with your own controller have a look at the leaderboard explanation in the double pendulum repository. The leaderboard is automatically periodically updated based on the controllers that have been contributed to that repository.