ICRA 2025 Acrobot Simulation Results

Controller Short Controller Description #swingups Uptime [s] RealAI Score Username Data
mcpilco Swingup trained with MBRL algorithm MC-PILCO + stabilization with LQR. 34 18.408 0.307 turcato-niccolo data plot video
VIMPPI Variational Integrator Model Predictive Path Integral for direct torque planning 16 38.882 0.648 adk data plot video
AcadosMpc Real-Time nonlinear Model Predictive Conntrol implemented with Acados framework 21 37.798 0.63 maranderine data plot video
AR-EAPO Policy trained with average reward maximum entropy RL 21 39.484 0.658 rnilva data plot video
Videos from left to right: AR_EAPO, VIMPPI, Acados MPC, MC-PILCO

Rules

This leaderboard shows the simulation results from the RealAIGym competition at ICRA 2025. The simulation leaderboard tests the global coverage of the double pendulums state space of different control methods in simulation. The task for the controller is to swingup and balance the acrobot and keep the end-effector above the threshold line. At random times during the execution, the pendulum is reset to a new initial state.

The model parameters of the acrobot are:

More information about the dynamic model of the double pendulum can be found here: Double Pendulum Dynamics. In the Double Pendulum Repository the parameters above are labeled as ‘designC.1/model1.1’. For a urdf file with this model see here: URDF.

The acrobot is simulated with a Runge-Kutta 4 integrator with a timestep of \(dt = 0.002 \, \text{s}\) for \(T = 60 \, \text{s}\). The initial acrobot configuration is \(x_0 = (0, 0, 0, 0)\) (hanging down) and the goal is the unstable fixpoint at the upright configuration \(x_g = (\pi, 0, 0, 0)\). The upright position is considered to be reached when the end-effector is above the threshold line at \(h=0.45 \, \text{m}\) (origin at the mounting point) and stays there until the end. At 15 random times during the execution the controller is switched off for 0.2s and the pendulum is reset to a new initial state. After the reset the controller is switched on again and the controller is supposed to swing the pendulum up from the new initial state. The leaderboard was evaluated with the numpy random seed 777.

Scores

The score is the time the pendulum spends in the goal region divided by the total runtime.

Participating

This leaderboard is only for the results from the competition at ICRA 2025. For participating checkout the ongoing leaderboard