Training speed ============== This page reports end-to-end training wall-clock time when using the default Stable-Baselines3 buffers versus the compressed buffer classes from this package. The runs mirror the example scripts :download:`example_train_rollout.py <../examples/example_train_rollout.py>` and :download:`example_train_replay.py <../examples/example_train_replay.py>`, with only the buffer class and compression method changed for the comparison. Benchmark setup --------------- Hardware ~~~~~~~~ - Mac mini (Apple M4, 16 GB RAM) - ``device="mps"`` Libraries ~~~~~~~~~ - Stable-Baselines3 2.8.0 - PyTorch 2.12.0 - sb3-extra-buffers 0.5.1 Each run trained for 10M environment steps. Compressed buffers used ``zstd-3`` with dtypes from :func:`~sb3_extra_buffers.compressed.find_buffer_dtypes`. PPO on ``PongNoFrameskip-v4`` ----------------------------- Hyperparameters follow the preset on Huggingface: `sb3/ppo-PongNoFrameskip-v4 `__ and check :download:`example_train_rollout.py <../examples/example_train_rollout.py>` for code: - Frame stack: ``1`` (no stacking) - ``n_envs``: ``8`` (train and eval) - ``n_steps``: ``128`` - ``batch_size``: ``256`` - ``n_epochs``: ``4`` - ``learning_rate``: linear schedule from ``2.5e-4`` to ``0`` - ``clip_range``: linear schedule from ``0.1`` to ``0`` - ``ent_coef``: ``0.01`` - ``vf_coef``: ``0.5`` - ``gae_lambda``: ``0.9`` - ``gamma``: ``0.99`` - ``max_grad_norm``: ``0.5`` - Policy: ``CnnPolicy`` with ``normalize_images=False`` .. list-table:: :header-rows: 1 * - Buffer - Wall-clock time * - SB3 ``RolloutBuffer`` - 2:56:16 * - ``CompressedRolloutBuffer`` (``zstd-3``) - 4:48:41 On this setup, ``zstd-3`` rollout compression added roughly 65% training time over the default buffer while keeping the same SB3 training loop. DQN on ``MsPacmanNoFrameskip-v4`` --------------------------------- Hyperparameters follow the preset on Huggingface: `sb3/dqn-MsPacmanNoFrameskip-v4 `__ and check :download:`example_train_replay.py <../examples/example_train_replay.py>` for code: - Frame stack: ``4`` - ``n_envs``: ``1`` (train), ``8`` (eval) - ``buffer_size``: ``100_000`` - ``batch_size``: ``32`` - ``learning_starts``: ``100_000`` - ``train_freq``: ``4`` - ``gradient_steps``: ``1`` - ``target_update_interval``: ``1000`` - ``learning_rate``: ``1e-4`` - ``exploration_fraction``: ``0.1`` - ``exploration_final_eps``: ``0.01`` - Policy: ``CnnPolicy`` .. list-table:: :header-rows: 1 * - Buffer - Wall-clock time * - SB3 ``ReplayBuffer`` - 12:33:26 * - ``CompressedReplayBuffer`` (``zstd-3``) - 12:44:16 For DQN replay compression, ``zstd-3`` added only about 11 minutes (~1.4%) on top of a 12.5-hour run. Off-policy algorithms spend more time in gradient updates than in buffer I/O, so compression overhead is smaller than for PPO. See also -------- - :doc:`benchmarks` for per-transition buffer memory and sampling latency - :doc:`validation` for reward results from the example training scripts