Controller | Short Controller Description | Model [%] | Velocity Noise [%] | Torque Noise [%] | Torque Step Response [%] | Time delay [%] | Perturbations [%] | Overall Robustness Score | Username | Data |
---|---|---|---|---|---|---|---|---|---|---|
mcpilco | Swingup trained with MBRL algorithm MC-PILCO + stabilization with LQR. | 9.0 | 9.5 | 19.0 | 52.4 | 4.8 | 50.0 | 0.241 | turcato-niccolo | Data and Plots |
History SAC | SAC using custom model architecture to encode system dynamics. | 58.1 | 23.8 | 100.0 | 100.0 | 90.5 | 78.0 | 0.751 | tfaust | Data and Plots |
TVLQR | Stabilization of iLQR trajectory with time-varying LQR. | 48.1 | 19.0 | 100.0 | 95.2 | 23.8 | 78.0 | 0.607 | fwiebe | Data and Plots |
AR-EAPO | Policy trained with average reward maximum entropy RL | 67.6 | 28.6 | 100.0 | 100.0 | 76.2 | 64.0 | 0.727 | rnilva | Data and Plots |
iLQR Riccati Gains | Stabilization of iLQR trajectory with Riccati gains. Top stabilization with LQR. | 4.8 | 9.5 | 9.5 | 52.4 | 4.8 | 2.0 | 0.138 | fwiebe | Data and Plots |
evolsac | Evolutionary SAC for both swingup and stabilisation | 64.8 | 23.8 | 100.0 | 100.0 | 47.6 | 76.0 | 0.687 | AlbertoSinigaglia | Data and Plots |
iLQR MPC stabilization | Online optimization with iterative LQR. Stabilization of iLQR trajectory. Top stabilization with LQR. | 6.7 | 9.5 | 90.5 | 52.4 | 4.8 | 42.0 | 0.343 | fwiebe | Data and Plots |
The robustness leaderboard compares the performance of different control methods by perturbing the simulation e.g. with noise or delay. The task for the controller is to swingup and balance the acrobot even with these perturbations.
The model parameters of the acrobot are:
More information about the dynamic model of the double pendulum can be found here: Double Pendulum Dynamics. In the Double Pendulum Repository the parameters above are labeled as ‘designC.1/model1.1’. For a urdf file with this model see here: URDF.
The acrobot is simulated with a Runge-Kutta 4 integrator with a timestep of \(dt = 0.002 \, \text{s}\) for \(T = 10 \, \text{s}\). The initial acrobot configuration is \(x_0 = (0.0, 0.0, 0.0, 0.0)\) (hanging down) and the goal is the unstable fixpoint at the upright configuration \(x_g = (\pi, 0.0, 0.0, 0.0)\). The upright position is considered to be reached when the end-effector is above the threshold line at \(h=0.45 \, \text{m}\) (origin at the mounting point) and stays there until the end.
For the evaluation multiple criteria are evaluated and weighted to calculate an overall score (Real AI Score). The criteria are:
For each criterion the quantities are varied in \(N=21\) steps (for the model inaccuracies for each independent model parameter) and the score is the percentage of successful swingups. 50 random perturbations profiles are generated and evaluated.
These criteria are used to calculate the overall Real AI Score with the formula
\[ S = \frac{1}{6} \left( c_{model} + c_{vel, noise} + c_{\tau, noise} + c_{\tau, response} + c_{delay} + c_{pert} \right) \]
If you want to participate in this leaderboard with your own controller have a look at the leaderboard explanation in the double pendulum repository. The leaderboard is automatically periodically updated based on the controllers that have been contributed to that repository.