Pendubot Simulation Robustness Leaderboard V1

Controller Short Controller Description Model [%] Velocity Noise [%] Torque Noise [%] Torque Step Response [%] Time delay [%] Overall Robustness Score Username Data
iLQR MPC stabilization Online optimization with iterative LQR. Stabilization of iLQR trajectory. Top stabilization with LQR. 44.3 100.0 100.0 100.0 85.7 0.860 fwiebe Data and Plots
SAC LQR Swing-up with an RL Policy learned with SAC. 71.9 100.0 100.0 100.0 76.2 0.896 chiniklas Data and Plots
iLQR Riccati Gains Stabilization of iLQR trajectorry with Riccati gains. Top stabilizaion with LQR. 10.5 100.0 61.9 90.5 33.3 0.592 fwiebe Data and Plots
mcpilco Swingup trained with MBRL algorithm MC-PILCO + stabilization with LQR. 45.2 90.5 100.0 100.0 90.5 0.852 turcato-niccolo Data and Plots
Energy PFL Partial Feedback Linearization with energy shaping control. Stabilization with LQR. 11.0 9.5 9.5 9.5 19.0 0.117 fwiebe Data and Plots
DQN LQR Swing-up with an RL Policy learned with DQN. 17.6 4.8 9.5 52.4 28.6 0.226 theovincent Data and Plots
iLQR MPC Online optimization with iterative LQR. Without reference trajectory. 19.0 4.8 4.8 61.9 9.5 0.200 fwiebe Data and Plots
TVLQR Stabilization of iLQR trajectory with time-varying LQR. 75.2 100.0 100.0 100.0 100.0 0.950 fwiebe Data and Plots

Rules

The robustness leaderboard compares the performance of different control methods by perturbing the simulation e.g. with noise or delay. The task for the controller is to swingup and balance the pendubot even with these perturbations.

The model parameters of the pendubot are:

More information about the dynamic model of the double pendulum can be found here: Double Pendulum Dynamics. For a urdf file with this model see here: URDF.

The pendubot is simulated with a Runge-Kutta 4 integrator with a timestep of \(dt = 0.002 \, \text{s}\) for \(T = 10 \, \text{s}\). The initial pendubot configuration is \(x_0 = (0.0, 0.0, 0.0, 0.0)\) (hanging down) and the goal is the unstable fixpoint at the upright configuration \(x_g = (\pi, 0.0, 0.0, 0.0)\). The upright position is considered to be reached when the distance in the state coordinates is below \(\epsilon = (0.1, 0.1, 0.5, 0.5)\).

Scores

For the evaluation multiple criteria are evaluated and weighted to calculate an overall score (Real AI Score). The criteria are:

For each crierion the quantities are varied in \(N=21\) steps (for the model inaccuracies for each independent model parameter) and the score is the percentage of successful swingups.

These criteria are used to calculate the overall Real AI Score with the formula

\[ S = w_{model} c_{model} + w_{vel, noise} c_{vel, noise} + w_{\tau, noise} c_{\tau, noise} + w_{\tau, response} c_{\tau, response} + w_{delay} c_{delay} \]

The weights are

\[ w_{model} = w_{vel, noise} = w_{\tau, noise} = w_{\tau, response} = w_{delay} = 0.2 \]

Participating

If you want to participate in this leaderboard with your own controller have a look at the leaderboard explanation in the double pendulum repository. The leaderboard is automatically periodically updated based on the controllers that have been contributed to that repository.