Training speed

This page reports end-to-end training wall-clock time when using the default Stable-Baselines3 buffers versus the compressed buffer classes from this package. The runs mirror the example scripts example_train_rollout.py and example_train_replay.py, with only the buffer class and compression method changed for the comparison.

Benchmark setup

Hardware

Mac mini (Apple M4, 16 GB RAM)
device="mps"

Libraries

Stable-Baselines3 2.8.0
PyTorch 2.12.0
sb3-extra-buffers 0.5.1

Each run trained for 10M environment steps. Compressed buffers used zstd-3 with dtypes from find_buffer_dtypes().

PPO on `PongNoFrameskip-v4`

Hyperparameters follow the preset on Huggingface: sb3/ppo-PongNoFrameskip-v4 and check example_train_rollout.py for code:

Frame stack: 1 (no stacking)
n_envs: 8 (train and eval)
n_steps: 128
batch_size: 256
n_epochs: 4
learning_rate: linear schedule from 2.5e-4 to 0
clip_range: linear schedule from 0.1 to 0
ent_coef: 0.01
vf_coef: 0.5
gae_lambda: 0.9
gamma: 0.99
max_grad_norm: 0.5
Policy: CnnPolicy with normalize_images=False

Buffer	Wall-clock time
SB3 `RolloutBuffer`	2:56:16
`CompressedRolloutBuffer` (`zstd-3`)	4:48:41

On this setup, zstd-3 rollout compression added roughly 65% training time over the default buffer while keeping the same SB3 training loop.

DQN on `MsPacmanNoFrameskip-v4`

Hyperparameters follow the preset on Huggingface: sb3/dqn-MsPacmanNoFrameskip-v4 and check example_train_replay.py for code:

Frame stack: 4
n_envs: 1 (train), 8 (eval)
buffer_size: 100_000
batch_size: 32
learning_starts: 100_000
train_freq: 4
gradient_steps: 1
target_update_interval: 1000
learning_rate: 1e-4
exploration_fraction: 0.1
exploration_final_eps: 0.01
Policy: CnnPolicy

Buffer	Wall-clock time
SB3 `ReplayBuffer`	12:33:26
`CompressedReplayBuffer` (`zstd-3`)	12:44:16

For DQN replay compression, zstd-3 added only about 11 minutes (~1.4%) on top of a 12.5-hour run. Off-policy algorithms spend more time in gradient updates than in buffer I/O, so compression overhead is smaller than for PPO.

Training speed

Benchmark setup

Hardware

Libraries

PPO on PongNoFrameskip-v4

DQN on MsPacmanNoFrameskip-v4

See also

PPO on `PongNoFrameskip-v4`

DQN on `MsPacmanNoFrameskip-v4`