Controller | Short Controller Description | Swingup Success | Swingup Time [s] | Energy [J] | Max. Torque [Nm] | Integrated Torque [Nms] | Torque Cost [N²m²] | Torque Smoothness [Nm] | Velocity Cost [m²/s²] | RealAI Score | Username | Data |
---|---|---|---|---|---|---|---|---|---|---|---|---|
iLQR MPC stabilization | Online optimization with iterative LQR. Stabilization of iLQR trajectory. Top stabilization with LQR. | 1/1 | 4.19 | 9.14 | 1.75 | 2.8 | 2.04 | 0.019 | 136.95 | 0.845 | fwiebe | data plot video |
SAC LQR | Swing-up with an RL Policy learned with SAC. | 1/1 | 0.66 | 9.29 | 5.0 | 2.14 | 8.51 | 0.174 | 44.99 | 0.876 | chiniklas | data plot video |
iLQR Riccati Gains | Stabilization of iLQR trajectorry with Riccati gains. Top stabilizaion with LQR. | 1/1 | 4.2 | 9.06 | 1.66 | 2.64 | 1.97 | 0.008 | 137.2 | 0.847 | fwiebe | data plot video |
mcpilco | Swingup trained with MBRL algorithm MC-PILCO + stabilization with LQR. | 1/1 | 1.43 | 12.67 | 2.4 | 3.48 | 7.77 | 0.07 | 94.68 | 0.891 | turcato-niccolo | data plot video |
Energy PFL | Partial Feedback Linearization with energy shaping control. Stabilization with LQR. | 1/1 | 4.8 | 53.75 | 3.0 | 12.28 | 33.06 | 0.136 | 870.16 | 0.594 | fwiebe | data plot video |
DQN LQR | Swing-up with an RL Policy learned with DQN. | 1/1 | 1.95 | 14.44 | 5.0 | 5.92 | 25.44 | 1.452 | 34.79 | 0.815 | theovincent | data plot video |
iLQR MPC | Online optimization with iterative LQR. Without reference trajectory. | 1/1 | 0.65 | 9.98 | 6.0 | 2.39 | 10.56 | 0.026 | 44.16 | 0.861 | fwiebe | data plot video |
TVLQR | Stabilization of iLQR trajectory with time-varying LQR. | 1/1 | 4.2 | 9.06 | 2.82 | 2.57 | 2.0 | 0.031 | 137.31 | 0.827 | fwiebe | data plot video |
The simulation leaderboard compares the performance of different control methods in simulation. The task for the controller is to swingup and balance the pendubot and keep the end-effector above the threshold line.
The model parameters of the pendubot are:
More information about the dynamic model of the double pendulum can be found here: Double Pendulum Dynamics. For a urdf file with this model see here: URDF.
The pendubot is simulated with a Runge-Kutta 4 integrator with a timestep of \(dt = 0.002 \, \text{s}\) for \(T = 10 \, \text{s}\). The initial pendubot configuration is \(x_0 = (0, 0, 0, 0)\) (hanging down) and the goal is the unstable fixpoint at the upright configuration \(x_g = (\pi, 0, 0, 0)\). The upright position is considered to be reached when the end-effector is above the threshold line at \(h=0.45 \, \text{m}\) (origin at the mounting point).
For the evaluation multiple criteria are evaluated and weighted to calculate an overall score (Real AI Score). The criteria are:
These criteria are used to calculate the overall Real AI Score with the formula
\[ \begin{equation} S = c_{success} \left( 1 - \left( w_{time}\frac{c_{time}}{n_{time}} + w_{energy}\frac{c_{energy}}{n_{energy}} + w_{\tau, max}\frac{c_{\tau, max}}{n_{\tau, max}} + w_{\tau, integ}\frac{c_{\tau, integ}}{n_{\tau, integ}} + w_{\tau, cost}\frac{c_{\tau, cost}}{n_{\tau, cost}} + w_{\tau, smooth}\frac{c_{\tau, smooth}}{n_{\tau, smooth}} + w_{vel, cost}\frac{c_{vel, cost}}{n_{vel, cost}} \right) \right) \end{equation} \]
The weights and normalizations are:
Criterion | normalization \(n\) | weight \(w\) |
---|---|---|
Swingup Time | 10.0 | 0.2 |
Energy | 100.0 | 0.1 |
Max Torque | 6.0 | 0.1 |
Integrated Torque | 60.0 | 0.1 |
Torque Cost | 360 | 0.1 |
Torque Smoothness | 12.0 | 0.2 |
Velocity Cost | 1000.0 | 0.2 |
If you want to participate in this leaderboard with your own controller have a look at the leaderboard explanation in the double pendulum repository. The leaderboard is automatically periodically updated based on the controllers that have been contributed to that repository.