Controller | Short Controller Description | Swingup Success | Swingup Time [s] | Energy [J] | Torque Cost [N²m²] | Torque Smoothness [Nm] | Velocity Cost [m²/s²] | RealAI Score | Username | Data |
---|---|---|---|---|---|---|---|---|---|---|
mcpilco | Swingup trained with MBRL algorithm MC-PILCO + stabilization with LQR. | 1/1 | 1.14 | 8.35 | 2.42 | 0.054 | 114.26 | 0.48 | turcato-niccolo | data plot video |
History SAC | SAC using custom model architecture to encode system dynamics. | 1/1 | 1.15 | 7.67 | 2.36 | 0.007 | 62.07 | 0.682 | tfaust | data plot video |
AR-EAPO | Policy trained with average reward maximum entropy RL | 1/1 | 1.15 | 7.72 | 2.43 | 0.01 | 64.33 | 0.659 | rnilva | data plot video |
iLQR Riccati Gains | Stabilization of iLQR trajectorry with Riccati gains. Top stabilizaion with LQR. | 1/1 | 4.13 | 9.53 | 1.25 | 0.005 | 211.34 | 0.536 | fwiebe | data plot video |
evolsac | Evolutionary SAC for both swingup and stabilisation | 1/1 | 0.71 | 9.83 | 4.37 | 0.014 | 58.15 | 0.596 | AlbertoSinigaglia | data plot video |
TVLQR | Stabilization of iLQR trajectory with time-varying LQR. | 1/1 | 4.13 | 9.53 | 1.26 | 0.007 | 211.12 | 0.526 | fwiebe | data plot video |
iLQR MPC stabilization | Online optimization with iterative LQR. Stabilization of iLQR trajectory. Top stabilization with LQR. | 1/1 | 4.12 | 9.91 | 1.77 | 0.083 | 211.98 | 0.353 | fwiebe | data plot video |
The simulation leaderboard compares the performance of different control methods in simulation. The task for the controller is to swingup and balance the pendubot and keep the end-effector above the threshold line.
The model parameters of the pendubot are:
More information about the dynamic model of the double pendulum can be found here: Double Pendulum Dynamics. In the Double Pendulum Repository the parameters above are labeled as ‘designC.1/model1.1’. For a urdf file with this model see here: URDF.
The pendubot is simulated with a Runge-Kutta 4 integrator with a timestep of \(dt = 0.002 \, \text{s}\) for \(T = 10 \, \text{s}\). The initial pendubot configuration is \(x_0 = (0, 0, 0, 0)\) (hanging down) and the goal is the unstable fixpoint at the upright configuration \(x_g = (\pi, 0, 0, 0)\). The upright position is considered to be reached when the end-effector is above the threshold line at \(h=0.45 \, \text{m}\) (origin at the mounting point) and stays there until the end.
For the evaluation multiple criteria are evaluated and weighted to calculate an overall score (RealAI Score). The criteria are:
These criteria are used to calculate the overall Real AI Score with the formula
\[ \begin{equation} S = c_{success} \left( 1 - \sum_{i \in \{ 'time', 'energy', '\tau, cost', '\tau, smooth', 'vel, cost' \}} \tanh \left(\frac{c_{i}}{n_{i}}\right)\right) \end{equation} \]
The normalizations coefficients are:
Criterion | normalization \(n\) |
---|---|
Swingup Time | 20.0 |
Energy | 60.0 |
Torque Cost | 20 |
Torque Smoothness | 0.1 |
Velocity Cost | 400.0 |
If you want to participate in this leaderboard with your own controller have a look at the leaderboard explanation in the double pendulum repository. The leaderboard is automatically periodically updated based on the controllers that have been contributed to that repository.