Environments¶

ApexRL provides flexible interfaces for integrating various types of reinforcement learning environments.

Overview¶

ApexRL is designed to work with:

Vectorized Environments - GPU-accelerated parallel environments
Gymnasium Environments - Standard single-threaded environments
Custom Environments - User-defined simulation backends
Structured Observation Environments - TensorDict / nested dict observation trees

VecEnv (Vectorized Environment)¶

The VecEnv class is the base interface for vectorized environments optimized for GPU execution.

Key Characteristics¶

All environments run synchronously (same step function)
Tensors are pre-allocated on GPU
Supports partial resets for efficiency
Designed for high-throughput training

Interface¶

class VecEnv(ABC):
    # Required attributes
    num_envs: int           # Number of parallel environments
    num_obs: int            # Observation dimension
    num_actions: int        # Action dimension
    device: torch.device    # Execution device

    # Required methods
    def reset(self) -> obs
    def step(self, actions) -> (obs, rewards, dones, extras)
    def reset_idx(self, env_ids) -> None

API Reference¶

DummyVecEnv¶

A simple test environment:

from apexrl.envs.vecenv import DummyVecEnv

env = DummyVecEnv(
    num_envs=4096,
    num_obs=48,
    num_actions=12,
    device="cuda",
    max_episode_length=1000,
)

Gymnasium Integration¶

ApexRL provides wrappers for standard Gymnasium environments. The wrappers support plain tensor observations, structured observations, and explicit timeout metadata.

GymVecEnv¶

Wraps multiple Gymnasium environments:

import gymnasium as gym
from apexrl.envs.gym_wrapper import GymVecEnv

def make_env():
    return gym.make("Pendulum-v1")

env = GymVecEnv([make_env for _ in range(8)], device="cpu")

GymVecEnvContinuous¶

For continuous action spaces with automatic clipping and action-space scaling:

from apexrl.envs.gym_wrapper import GymVecEnvContinuous

env = GymVecEnvContinuous(
    [make_env for _ in range(8)],
    device="cpu",
    clip_actions=True,  # Clip to action space bounds
)

This is the recommended wrapper for PPO on Gymnasium Box action spaces. The default continuous PPO policy is an unsquashed Gaussian (use_tanh_squash=False), and the wrapper handles action bounds.

Recommended structured observation format:

{
    "obs": {
        "image": image,
        "vector": vector,
    },
    "privileged_obs": {
        "state": state,
    },
}

API Reference¶

Environment Wrappers¶

VecEnvWrapper¶

Base class for creating environment wrappers:

from apexrl.envs.vecenv import VecEnvWrapper

class NormalizeReward(VecEnvWrapper):
    def __init__(self, env):
        super().__init__(env)
        self.return_rms = RunningMeanStd()
        self.returns = torch.zeros(env.num_envs)

    def step(self, actions):
        obs, rewards, dones, extras = self.env.step(actions)

        # Update running statistics
        self.returns = self.returns * gamma + rewards
        self.return_rms.update(self.returns)

        # Normalize rewards
        rewards = rewards / torch.sqrt(self.return_rms.var + 1e-8)

        return obs, rewards, dones, extras

API Reference¶

Third-Party Integrations¶

Best Practices¶

Pre-allocate Buffers: Allocate observation/reward buffers in __init__
Use ``reset_idx``: Implement partial reset for efficiency
Handle Episode End Semantics: Return terminated and truncated
Provide Final State: Set extras["final_observation"] for truncated episodes
Device Consistency: Ensure all tensors on same device
Logging: Add useful metrics to extras["log"]

Environments¶

Overview¶

VecEnv (Vectorized Environment)¶

Key Characteristics¶

Interface¶

API Reference¶

DummyVecEnv¶

Gymnasium Integration¶

GymVecEnv¶

GymVecEnvContinuous¶

API Reference¶

Environment Wrappers¶

VecEnvWrapper¶

API Reference¶

Third-Party Integrations¶

Best Practices¶

See Also¶