Pendubot Real System Leaderboard V1

Controller Short Controller Description Swingup Success Swingup Time [s] Energy [J] Max. Torque [Nm] Integrated Torque [Nms] Torque Cost [N²m²] Torque Smoothness [Nm] Velocity Cost [m²/s²] Best RealAI Score Average RealAI Score Username Data
ilqr_tvlqr_lqr Stabilization of iLQR trajectory with time-varying LQR. 8/10 4.12 34.02 5.0 19.06 51.88 0.643 242.34 0.695 0.547 fwiebe data plot video
sac_lqr Swing-up with an RL Policy learned with SAC. Stabilization with LQR. 4/10 0.67 37.12 5.0 24.87 78.7 0.774 114.04 0.767 0.298 chiniklas data plot video
MC-PILCO MC-PILCO for swingup and stabilization 10/10 1.37 11.66 4.99 3.72 8.93 0.54 84.61 0.843 0.839 turcato-niccolo data plot video

Rules

The real system leaderboard compares the performance of different control methods on the real hardware. The task for the controller is to swingup and balance the pendubot and keep the end-effector above the threshold line.

Videos from left to right: TVLQR, MC-PILCO, SAC

The model parameters identified by us with a least squares optimization of the pendubot are:

More information about the dynamic model of the double pendulum can be found here: Double Pendulum Dynamics. For a urdf file with this model see here: URDF.

The \(0.5\,\text{Nm}\) torque limit on the passive joint can be used to compensate the friction of the motor.

The actuators can be controlled with arbitrary control frequency of up to \(500\, \text{Hz}\) and the experiment takes \(10\,\text{s}\). The initial pendubot configuration is \(x_0 = (0, 0, 0, 0)\) (hanging down) and the goal is the unstable fixpoint at the upright configuration \(x_g = (\pi, 0, 0, 0)\). The upright position is considered to be reached when the end-effector is above the threshold line at \(h=0.45 \, \text{m}\) (origin at the mounting point).

Scores

For the evaluation multiple criteria are evaluated and weighted to calculate an overall score (Real AI Score). The criteria are:

These criteria are used to calculate the overall Real AI Score with the formula

\[ \begin{equation} S = c_{success} \left(1 - \left( w_{time}\frac{c_{time}}{n_{time}} + w_{energy}\frac{c_{energy}}{n_{energy}} + w_{\tau, max}\frac{c_{\tau, max}}{n_{\tau, max}} + w_{\tau, integ}\frac{c_{\tau, integ}}{n_{\tau, integ}} + w_{\tau, cost}\frac{c_{\tau, cost}}{n_{\tau, cost}} + w_{\tau, smooth}\frac{c_{\tau, smooth}}{n_{\tau, smooth}} + w_{vel, cost}\frac{c_{vel, cost}}{n_{vel, cost}} \right) \right) \end{equation} \]

The weights and normalizations are:

Criterion normalization \(n\) weight \(w\)
Swingup Time 10.0 0.2
Energy 100.0 0.1
Max Torque 6.0 0.1
Integrated Torque 60.0 0.1
Torque Cost 360 0.1
Torque Smoothness 12.0 0.2
Velocity Cost 1000.0 0.2

The listed number for swingup time, energy, etc. in the leaderboard are the scores for from the best attempt. The ‘Best RealAIScore’ is the score of that attempt. The ‘Average RealAIScore’ is the average score over 10 attempts. Unsuccessful swingups (where the end effector is not above the threshhold line in the end of the experiment (i.e. after \(10\,\text{s}\))) have a score of 0.

Participating

If you want to participate in this leaderboard with your own controller have a look at the leaderboard explanation in the double pendulum repository. We recommend submitting the controller first to the pendubot simulation leaderboard and the robustness leaderboard. Experiments with the real hardware can be conducted remotely. Please contact shivesh.kumar@dfki.de, felix.wiebe@dfki.de or shubham.vyas@dfki.de for details and scheduling. The leaderboard is automatically periodically updated based on the recorded data which is uploaded to this leaderboard repository.