Environments¶
ApexRL provides flexible interfaces for integrating various types of reinforcement learning environments.
Overview¶
ApexRL is designed to work with:
Vectorized Environments - GPU-accelerated parallel environments
Gymnasium Environments - Standard single-threaded environments
Custom Environments - User-defined simulation backends
Structured Observation Environments - TensorDict / nested dict observation trees
VecEnv (Vectorized Environment)¶
The VecEnv class is the base interface for vectorized environments optimized for GPU execution.
Key Characteristics¶
All environments run synchronously (same step function)
Tensors are pre-allocated on GPU
Supports partial resets for efficiency
Designed for high-throughput training
Interface¶
class VecEnv(ABC):
# Required attributes
num_envs: int # Number of parallel environments
num_obs: int # Observation dimension
num_actions: int # Action dimension
device: torch.device # Execution device
# Required methods
def reset(self) -> obs
def step(self, actions) -> (obs, rewards, dones, extras)
def reset_idx(self, env_ids) -> None
API Reference¶
DummyVecEnv¶
A simple test environment:
from apexrl.envs.vecenv import DummyVecEnv
env = DummyVecEnv(
num_envs=4096,
num_obs=48,
num_actions=12,
device="cuda",
max_episode_length=1000,
)
Gymnasium Integration¶
ApexRL provides wrappers for standard Gymnasium environments. The wrappers support plain tensor observations, structured observations, and explicit timeout metadata.
GymVecEnv¶
Wraps multiple Gymnasium environments:
import gymnasium as gym
from apexrl.envs.gym_wrapper import GymVecEnv
def make_env():
return gym.make("Pendulum-v1")
env = GymVecEnv([make_env for _ in range(8)], device="cpu")
GymVecEnvContinuous¶
For continuous action spaces with automatic clipping and action-space scaling:
from apexrl.envs.gym_wrapper import GymVecEnvContinuous
env = GymVecEnvContinuous(
[make_env for _ in range(8)],
device="cpu",
clip_actions=True, # Clip to action space bounds
)
This is the recommended wrapper for PPO on Gymnasium Box action spaces.
The default continuous PPO policy is an unsquashed Gaussian
(use_tanh_squash=False), and the wrapper handles action bounds.
Recommended structured observation format:
{
"obs": {
"image": image,
"vector": vector,
},
"privileged_obs": {
"state": state,
},
}
API Reference¶
Environment Wrappers¶
VecEnvWrapper¶
Base class for creating environment wrappers:
from apexrl.envs.vecenv import VecEnvWrapper
class NormalizeReward(VecEnvWrapper):
def __init__(self, env):
super().__init__(env)
self.return_rms = RunningMeanStd()
self.returns = torch.zeros(env.num_envs)
def step(self, actions):
obs, rewards, dones, extras = self.env.step(actions)
# Update running statistics
self.returns = self.returns * gamma + rewards
self.return_rms.update(self.returns)
# Normalize rewards
rewards = rewards / torch.sqrt(self.return_rms.var + 1e-8)
return obs, rewards, dones, extras
API Reference¶
Third-Party Integrations¶
Best Practices¶
Pre-allocate Buffers: Allocate observation/reward buffers in
__init__Use ``reset_idx``: Implement partial reset for efficiency
Handle Episode End Semantics: Return
terminatedandtruncatedProvide Final State: Set
extras["final_observation"]for truncated episodesDevice Consistency: Ensure all tensors on same device
Logging: Add useful metrics to
extras["log"]
See Also¶
Custom Environment Integration - Detailed integration tutorial
apexrl.envs package - Full API reference