Training speed
This page reports end-to-end training wall-clock time when using the default
Stable-Baselines3 buffers versus the compressed buffer classes from this package.
The runs mirror the example scripts
example_train_rollout.py and
example_train_replay.py, with
only the buffer class and compression method changed for the comparison.
Benchmark setup
Hardware
Mac mini (Apple M4, 16 GB RAM)
device="mps"
Libraries
Stable-Baselines3 2.8.0
PyTorch 2.12.0
sb3-extra-buffers 0.5.1
Each run trained for 10M environment steps. Compressed buffers used zstd-3
with dtypes from find_buffer_dtypes().
PPO on PongNoFrameskip-v4
Hyperparameters follow the preset on Huggingface:
sb3/ppo-PongNoFrameskip-v4
and check example_train_rollout.py for code:
Frame stack:
1(no stacking)n_envs:8(train and eval)n_steps:128batch_size:256n_epochs:4learning_rate: linear schedule from2.5e-4to0clip_range: linear schedule from0.1to0ent_coef:0.01vf_coef:0.5gae_lambda:0.9gamma:0.99max_grad_norm:0.5Policy:
CnnPolicywithnormalize_images=False
Buffer |
Wall-clock time |
|---|---|
SB3 |
2:56:16 |
|
4:48:41 |
On this setup, zstd-3 rollout compression added roughly 65% training time
over the default buffer while keeping the same SB3 training loop.
DQN on MsPacmanNoFrameskip-v4
Hyperparameters follow the preset on Huggingface:
sb3/dqn-MsPacmanNoFrameskip-v4
and check example_train_replay.py for code:
Frame stack:
4n_envs:1(train),8(eval)buffer_size:100_000batch_size:32learning_starts:100_000train_freq:4gradient_steps:1target_update_interval:1000learning_rate:1e-4exploration_fraction:0.1exploration_final_eps:0.01Policy:
CnnPolicy
Buffer |
Wall-clock time |
|---|---|
SB3 |
12:33:26 |
|
12:44:16 |
For DQN replay compression, zstd-3 added only about 11 minutes (~1.4%) on
top of a 12.5-hour run. Off-policy algorithms spend more time in gradient
updates than in buffer I/O, so compression overhead is smaller than for PPO.
See also
Benchmarks for per-transition buffer memory and sampling latency
Validation for reward results from the example training scripts