Validation
The repository includes example scripts for training and evaluating SB3 models with compressed buffers. They are intended to verify that the buffer classes can be used with minimal change to normal SB3 training code. Browse the examples in the examples directory.
Training setup
The runs below used the same Atari environments and hyperparameters as the presets on Huggingface: sb3/ppo-PongNoFrameskip-v4, sb3/ppo-MsPacmanNoFrameskip-v4, and sb3/dqn-MsPacmanNoFrameskip-v4.
Hardware: M4 Macbook Air
Software: Stable-Baselines3 2.7.0, sb3-extra-buffers 0.4.3
Device:
mpsTraining length: 10M environment steps
For end-to-end wall-clock time comparing default SB3 buffers with
CompressedRolloutBuffer / CompressedReplayBuffer using zstd-3, see
Training speed.
Evaluation results for example training scripts
The example scripts have been run and evaluated to confirm they train correctly.
Each run below used rle-jit compression.
PPO on PongNoFrameskip-v4, no frame stack:
(Best ) Evaluated 10000 episodes, mean reward: 21.0 +/- 0.00
Q1: 21 | Q2: 21 | Q3: 21 | Relative IQR: 0.00 | Min: 21 | Max: 21
(Final) Evaluated 10000 episodes, mean reward: 21.0 +/- 0.02
Q1: 21 | Q2: 21 | Q3: 21 | Relative IQR: 0.00 | Min: 20 | Max: 21
PPO on MsPacmanNoFrameskip-v4, with frame stack 4:
(Best ) Evaluated 10000 episodes, mean reward: 2667.0 +/- 290.00
Q1: 2300 | Q2: 2490 | Q3: 3000 | Relative IQR: 0.28 | Min: 2300 | Max: 3000
(Final) Evaluated 10000 episodes, mean reward: 2500.9 +/- 221.03
Q1: 2300 | Q2: 2390 | Q3: 2490 | Relative IQR: 0.08 | Min: 1420 | Max: 3000
DQN on MsPacmanNoFrameskip-v4, with frame stack 4:
(Best ) Evaluated 10000 episodes, mean reward: 3300.0 +/- 770.79
Q1: 2490 | Q2: 4020 | Q3: 4020 | Relative IQR: 0.38 | Min: 2460 | Max: 4020
(Final) Evaluated 10000 episodes, mean reward: 3379.2 +/- 453.78
Q1: 2690 | Q2: 3400 | Q3: 3880 | Relative IQR: 0.35 | Min: 1230 | Max: 4090