IROS 2024 Pendubot Results

Controller Short Controller Description Swingup Success Swingup Time [s] Energy [J] Torque Cost [N²m²] Torque Smoothness [Nm] Velocity Cost [m²/s²] Best RealAI Score Average RealAI Score Username Data
ar_eapo Policy trained with average reward maximum entropy RL 10/10 0.77 11.7 7.33 0.237 49.53 0.711 0.646 rnilva data plot video
MC-PILCO Controller trained with the Model-Based Reinforcement Learning algorithm MC-PILCO for swing-up and stabilization around the unstable equilibrium of the Pendubot. 10/10 0.73 10.22 9.92 0.475 36.84 0.691 0.644 turcato-niccolo data plot video
history_sac SAC using custom model architecture to encode system dynamics. 7/10 5.73 13.98 5.08 0.197 141.48 0.509 0.343 tfaust data plot video
evolsac Evolutionary SAC for both swingup and stabilisation 2/10 5.02 23.16 18.59 0.346 243.33 0.351 0.068 AlbertoSinigaglia data plot video
Videos from left to right: AR_EAPO, MC-PILCO, History_SAC, EvolSAC

Rules

This leaderboard shows the final results from the RealAIGym competition at IROS 2024.

The real system leaderboard compares the performance of different control methods on the real hardware. The task for the controller is to swingup and balance the pendubot and keep the end-effector above the threshold line.

During the execution external perturbations in the form of Gaußian torque peaks have been applied by the motors at both links. In total there were 4 perturbations, 2 at each joint, with a standard deviation between 0.05s and 0.1s and an amplitude between 0.2Nm and 0.5Nm. The perturbations were randomly generated and different in all 10 trials. The perturbations are the same for all controllers.

The model parameters identified by us with a least squares optimization of the pendubot are:

More information about the dynamic model of the double pendulum can be found here: Double Pendulum Dynamics. For a urdf file with this model see here: URDF.

The \(0.5\,\text{Nm}\) torque limit on the passive joint can be used to compensate the friction of the motor.

The actuators can be controlled with arbitrary control frequency of up to \(500\, \text{Hz}\) and the experiment takes \(10\,\text{s}\). The initial pendubot configuration is \(x_0 = (0, 0, 0, 0)\) (hanging down) and the goal is the unstable fixpoint at the upright configuration \(x_g = (\pi, 0, 0, 0)\). The upright position is considered to be reached when the end-effector is above the threshold line at \(h=0.45 \, \text{m}\) (origin at the mounting point).

Scores

For the evaluation multiple criteria are evaluated and weighted to calculate an overall score (Real AI Score). The criteria are:

These criteria are used to calculate the overall Real AI Score with the formula

\[ \begin{equation} S = c_{success} \left( 1 - \sum_{i \in \{ 'time', 'energy', '\tau, cost', '\tau, smooth', 'vel, cost' \}} \tanh \left(\frac{c_{i}}{n_{i}}\right)\right) \end{equation} \]

The weights and normalizations are:

Criterion normalization \(n\)
Swingup Time 20.0
Energy 60.0
Torque Cost 100
Torque Smoothness 4.0
Velocity Cost 400.0

The listed number for swingup time, energy, etc. in the leaderboard are the scores for from the best attempt. The ‘Best RealAIScore’ is the score of that attempt. The ‘Average RealAIScore’ is the average score over 10 attempts. Unsuccessful swingups (where the end effector is not above the threshhold line in the end of the experiment (i.e. after \(10\,\text{s}\))) have a score of 0.

Participating

This leaderboard is only for the results from the competition at IROS 2024. For participating checkout the ongoing leaderboard