Controller | Short Controller Description | Swingup Success | Swingup Time [s] | Energy [J] | Max. Torque [Nm] | Integrated Torque [Nms] | Torque Cost [N²m²] | Torque Smoothness [Nm] | Velocity Cost [m²/s²] | RealAI Score | Username | Data |
---|---|---|---|---|---|---|---|---|---|---|---|---|
iLQR MPC stabilization | Online optimization with iterative LQR. Stabilization of iLQR trajectory. Top stabilization with LQR. | 1/1 | 4.15 | 10.8 | 4.47 | 2.61 | 2.34 | 0.076 | 99.12 | 0.806 | fwiebe | data plot video |
TVLQR | Stabilization of iLQR trajectory with time-varying LQR. | 1/1 | 3.98 | 10.92 | 5.0 | 2.27 | 2.47 | 0.077 | 100.34 | 0.8 | fwiebe | data plot video |
SAC LQR | Swing-up with an RL Policy learned with SAC. | 1/1 | 1.55 | 24.86 | 4.98 | 3.72 | 10.32 | 0.558 | 158.35 | 0.811 | chiniklas | data plot video |
iLQR Riccati Gains | Stabilization of iLQR trajectorry with Riccati gains. Top stabilizaion with LQR. | 1/1 | 4.15 | 10.69 | 3.01 | 2.35 | 2.2 | 0.013 | 99.16 | 0.831 | fwiebe | data plot video |
mcpilco | Swingup trained with MBRL algorithm MC-PILCO + stabilization with LQR. | 1/1 | 1.1 | 9.81 | 2.82 | 1.27 | 2.27 | 0.057 | 242.44 | 0.869 | turcato-niccolo | data plot video |
Energy PFL | Partial Feedback Linearization with energy shaping control. Stabilization with LQR. | 1/1 | 4.12 | 29.01 | 5.0 | 7.12 | 18.79 | 0.251 | 280.81 | 0.728 | fwiebe | data plot video |
iLQR MPC | Online optimization with iterative LQR. Without reference trajectory. | 1/1 | 1.47 | 29.62 | 6.0 | 3.47 | 12.16 | 0.046 | 175.52 | 0.796 | fwiebe | data plot video |
The simulation leaderboard compares the performance of different control methods in simulation. The task for the controller is to swingup and balance the acrobot and keep the end-effector above the threshold line.
The model parameters of the acrobot are:
More information about the dynamic model of the double pendulum can be found here: Double Pendulum Dynamics. For a urdf file with this model see here: URDF.
The acrobot is simulated with a Runge-Kutta 4 integrator with a timestep of \(dt = 0.002 \, \text{s}\) for \(T = 10 \, \text{s}\). The initial acrobot configuration is \(x_0 = (0, 0, 0, 0)\) (hanging down) and the goal is the unstable fixpoint at the upright configuration \(x_g = (\pi, 0, 0, 0)\). The upright position is considered to be reached when the end-effector is above the threshold line at \(h=0.45 \, \text{m}\) (origin at the mounting point).
For the evaluation multiple criteria are evaluated and weighted to calculate an overall score (Real AI Score). The criteria are:
These criteria are used to calculate the overall Real AI Score with the formula
\[ \begin{equation} S = c_{success} \left( 1 - \left( w_{time}\frac{c_{time}}{n_{time}} + w_{energy}\frac{c_{energy}}{n_{energy}} + w_{\tau, max}\frac{c_{\tau, max}}{n_{\tau, max}} + w_{\tau, integ}\frac{c_{\tau, integ}}{n_{\tau, integ}} + w_{\tau, cost}\frac{c_{\tau, cost}}{n_{\tau, cost}} + w_{\tau, smooth}\frac{c_{\tau, smooth}}{n_{\tau, smooth}} + w_{vel, cost}\frac{c_{vel, cost}}{n_{vel, cost}} \right) \right) \end{equation} \]
The weights and normalizations are:
Criterion | normalization \(n\) | weight \(w\) |
---|---|---|
Swingup Time | 10.0 | 0.2 |
Energy | 100.0 | 0.1 |
Max Torque | 6.0 | 0.1 |
Integrated Torque | 60.0 | 0.1 |
Torque Cost | 360 | 0.1 |
Torque Smoothness | 12.0 | 0.2 |
Velocity Cost | 1000.0 | 0.2 |
If you want to participate in this leaderboard with your own controller have a look at the leaderboard explanation in the double pendulum repository. The leaderboard is automatically periodically updated based on the controllers that have been contributed to that repository.