Creating a Manager-Based RL Environment#
Having learnt how to create a base environment in Creating a Manager-Based Base Environment, we will now look at how to create a manager-based task environment for reinforcement learning.
The base environment is designed as an sense-act environment where the agent can send commands to the environment
and receive observations from the environment. This minimal interface is sufficient for many applications such as
traditional motion planning and controls. However, many applications require a task-specification which often
serves as the learning objective for the agent. For instance, in a navigation task, the agent may be required to
reach a goal location. To this end, we use the envs.ManagerBasedRLEnv
class which extends the base environment
to include a task specification.
Similar to other components in Isaac Lab, instead of directly modifying the base class envs.ManagerBasedRLEnv
, we
encourage users to simply implement a configuration envs.ManagerBasedRLEnvCfg
for their task environment.
This practice allows us to separate the task specification from the environment implementation, making it easier
to reuse components of the same environment for different tasks.
In this tutorial, we will configure the cartpole environment using the envs.ManagerBasedRLEnvCfg
to create a manager-based task
for balancing the pole upright. We will learn how to specify the task using reward terms, termination criteria,
curriculum and commands.
The Code#
For this tutorial, we use the cartpole environment defined in isaaclab_tasks.manager_based.classic.cartpole
module.
Code for cartpole_env_cfg.py
1# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
2# All rights reserved.
3#
4# SPDX-License-Identifier: BSD-3-Clause
5
6# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
7# All rights reserved.
8#
9# SPDX-License-Identifier: BSD-3-Clause
10
11import math
12
13import isaaclab.sim as sim_utils
14from isaaclab.assets import ArticulationCfg, AssetBaseCfg
15from isaaclab.envs import ManagerBasedRLEnvCfg
16from isaaclab.managers import EventTermCfg as EventTerm
17from isaaclab.managers import ObservationGroupCfg as ObsGroup
18from isaaclab.managers import ObservationTermCfg as ObsTerm
19from isaaclab.managers import RewardTermCfg as RewTerm
20from isaaclab.managers import SceneEntityCfg
21from isaaclab.managers import TerminationTermCfg as DoneTerm
22from isaaclab.scene import InteractiveSceneCfg
23from isaaclab.utils import configclass
24
25import isaaclab_tasks.manager_based.classic.cartpole.mdp as mdp
26
27##
28# Pre-defined configs
29##
30from isaaclab_assets.robots.cartpole import CARTPOLE_CFG # isort:skip
31
32
33##
34# Scene definition
35##
36
37
38@configclass
39class CartpoleSceneCfg(InteractiveSceneCfg):
40 """Configuration for a cart-pole scene."""
41
42 # ground plane
43 ground = AssetBaseCfg(
44 prim_path="/World/ground",
45 spawn=sim_utils.GroundPlaneCfg(size=(100.0, 100.0)),
46 )
47
48 # cartpole
49 robot: ArticulationCfg = CARTPOLE_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot")
50
51 # lights
52 dome_light = AssetBaseCfg(
53 prim_path="/World/DomeLight",
54 spawn=sim_utils.DomeLightCfg(color=(0.9, 0.9, 0.9), intensity=500.0),
55 )
56
57
58##
59# MDP settings
60##
61
62
63@configclass
64class ActionsCfg:
65 """Action specifications for the MDP."""
66
67 joint_effort = mdp.JointEffortActionCfg(asset_name="robot", joint_names=["slider_to_cart"], scale=100.0)
68
69
70@configclass
71class ObservationsCfg:
72 """Observation specifications for the MDP."""
73
74 @configclass
75 class PolicyCfg(ObsGroup):
76 """Observations for policy group."""
77
78 # observation terms (order preserved)
79 joint_pos_rel = ObsTerm(func=mdp.joint_pos_rel)
80 joint_vel_rel = ObsTerm(func=mdp.joint_vel_rel)
81
82 def __post_init__(self) -> None:
83 self.enable_corruption = False
84 self.concatenate_terms = True
85
86 # observation groups
87 policy: PolicyCfg = PolicyCfg()
88
89
90@configclass
91class EventCfg:
92 """Configuration for events."""
93
94 # reset
95 reset_cart_position = EventTerm(
96 func=mdp.reset_joints_by_offset,
97 mode="reset",
98 params={
99 "asset_cfg": SceneEntityCfg("robot", joint_names=["slider_to_cart"]),
100 "position_range": (-1.0, 1.0),
101 "velocity_range": (-0.5, 0.5),
102 },
103 )
104
105 reset_pole_position = EventTerm(
106 func=mdp.reset_joints_by_offset,
107 mode="reset",
108 params={
109 "asset_cfg": SceneEntityCfg("robot", joint_names=["cart_to_pole"]),
110 "position_range": (-0.25 * math.pi, 0.25 * math.pi),
111 "velocity_range": (-0.25 * math.pi, 0.25 * math.pi),
112 },
113 )
114
115
116@configclass
117class RewardsCfg:
118 """Reward terms for the MDP."""
119
120 # (1) Constant running reward
121 alive = RewTerm(func=mdp.is_alive, weight=1.0)
122 # (2) Failure penalty
123 terminating = RewTerm(func=mdp.is_terminated, weight=-2.0)
124 # (3) Primary task: keep pole upright
125 pole_pos = RewTerm(
126 func=mdp.joint_pos_target_l2,
127 weight=-1.0,
128 params={"asset_cfg": SceneEntityCfg("robot", joint_names=["cart_to_pole"]), "target": 0.0},
129 )
130 # (4) Shaping tasks: lower cart velocity
131 cart_vel = RewTerm(
132 func=mdp.joint_vel_l1,
133 weight=-0.01,
134 params={"asset_cfg": SceneEntityCfg("robot", joint_names=["slider_to_cart"])},
135 )
136 # (5) Shaping tasks: lower pole angular velocity
137 pole_vel = RewTerm(
138 func=mdp.joint_vel_l1,
139 weight=-0.005,
140 params={"asset_cfg": SceneEntityCfg("robot", joint_names=["cart_to_pole"])},
141 )
142
143
144@configclass
145class TerminationsCfg:
146 """Termination terms for the MDP."""
147
148 # (1) Time out
149 time_out = DoneTerm(func=mdp.time_out, time_out=True)
150 # (2) Cart out of bounds
151 cart_out_of_bounds = DoneTerm(
152 func=mdp.joint_pos_out_of_manual_limit,
153 params={"asset_cfg": SceneEntityCfg("robot", joint_names=["slider_to_cart"]), "bounds": (-3.0, 3.0)},
154 )
155
156
157##
158# Environment configuration
159##
160
161
162@configclass
163class CartpoleEnvCfg(ManagerBasedRLEnvCfg):
164 """Configuration for the cartpole environment."""
165
166 # Scene settings
167 scene: CartpoleSceneCfg = CartpoleSceneCfg(num_envs=4096, env_spacing=4.0)
168 # Basic settings
169 observations: ObservationsCfg = ObservationsCfg()
170 actions: ActionsCfg = ActionsCfg()
171 events: EventCfg = EventCfg()
172 # MDP settings
173 rewards: RewardsCfg = RewardsCfg()
174 terminations: TerminationsCfg = TerminationsCfg()
175
176 # Post initialization
177 def __post_init__(self) -> None:
178 """Post initialization."""
179 # general settings
180 self.decimation = 2
181 self.episode_length_s = 5
182 # viewer settings
183 self.viewer.eye = (8.0, 0.0, 5.0)
184 # simulation settings
185 self.sim.dt = 1 / 120
186 self.sim.render_interval = self.decimation
The script for running the environment run_cartpole_rl_env.py
is present in the
isaaclab/scripts/tutorials/03_envs
directory. The script is similar to the
cartpole_base_env.py
script in the previous tutorial, except that it uses the
envs.ManagerBasedRLEnv
instead of the envs.ManagerBasedEnv
.
Code for run_cartpole_rl_env.py
1# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
2# All rights reserved.
3#
4# SPDX-License-Identifier: BSD-3-Clause
5
6# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
7# All rights reserved.
8#
9# SPDX-License-Identifier: BSD-3-Clause
10
11"""
12This script demonstrates how to run the RL environment for the cartpole balancing task.
13
14.. code-block:: bash
15
16 ./isaaclab.sh -p scripts/tutorials/03_envs/run_cartpole_rl_env.py --num_envs 32
17
18"""
19
20"""Launch Isaac Sim Simulator first."""
21
22import argparse
23
24from isaaclab.app import AppLauncher
25
26# add argparse arguments
27parser = argparse.ArgumentParser(description="Tutorial on running the cartpole RL environment.")
28parser.add_argument("--num_envs", type=int, default=16, help="Number of environments to spawn.")
29
30# append AppLauncher cli args
31AppLauncher.add_app_launcher_args(parser)
32# parse the arguments
33args_cli = parser.parse_args()
34
35# launch omniverse app
36app_launcher = AppLauncher(args_cli)
37simulation_app = app_launcher.app
38
39"""Rest everything follows."""
40
41import torch
42
43from isaaclab.envs import ManagerBasedRLEnv
44
45from isaaclab_tasks.manager_based.classic.cartpole.cartpole_env_cfg import CartpoleEnvCfg
46
47
48def main():
49 """Main function."""
50 # create environment configuration
51 env_cfg = CartpoleEnvCfg()
52 env_cfg.scene.num_envs = args_cli.num_envs
53 env_cfg.sim.device = args_cli.device
54 # setup RL environment
55 env = ManagerBasedRLEnv(cfg=env_cfg)
56
57 # simulate physics
58 count = 0
59 while simulation_app.is_running():
60 with torch.inference_mode():
61 # reset
62 if count % 300 == 0:
63 count = 0
64 env.reset()
65 print("-" * 80)
66 print("[INFO]: Resetting environment...")
67 # sample random actions
68 joint_efforts = torch.randn_like(env.action_manager.action)
69 # step the environment
70 obs, rew, terminated, truncated, info = env.step(joint_efforts)
71 # print current orientation of pole
72 print("[Env 0]: Pole joint: ", obs["policy"][0][1].item())
73 # update counter
74 count += 1
75
76 # close the environment
77 env.close()
78
79
80if __name__ == "__main__":
81 # run the main function
82 main()
83 # close sim app
84 simulation_app.close()
The Code Explained#
We already went through parts of the above in the Creating a Manager-Based Base Environment tutorial to learn about how to specify the scene, observations, actions and events. Thus, in this tutorial, we will focus only on the RL components of the environment.
In Isaac Lab, we provide various implementations of different terms in the envs.mdp
module. We will use
some of these terms in this tutorial, but users are free to define their own terms as well. These
are usually placed in their task-specific sub-package
(for instance, in isaaclab_tasks.manager_based.classic.cartpole.mdp
).
Defining rewards#
The managers.RewardManager
is used to compute the reward terms for the agent. Similar to the other
managers, its terms are configured using the managers.RewardTermCfg
class. The
managers.RewardTermCfg
class specifies the function or callable class that computes the reward
as well as the weighting associated with it. It also takes in dictionary of arguments, "params"
that are passed to the reward function when it is called.
For the cartpole task, we will use the following reward terms:
Alive Reward: Encourage the agent to stay alive for as long as possible.
Terminating Reward: Similarly penalize the agent for terminating.
Pole Angle Reward: Encourage the agent to keep the pole at the desired upright position.
Cart Velocity Reward: Encourage the agent to keep the cart velocity as small as possible.
Pole Velocity Reward: Encourage the agent to keep the pole velocity as small as possible.
@configclass
class RewardsCfg:
"""Reward terms for the MDP."""
# (1) Constant running reward
alive = RewTerm(func=mdp.is_alive, weight=1.0)
# (2) Failure penalty
terminating = RewTerm(func=mdp.is_terminated, weight=-2.0)
# (3) Primary task: keep pole upright
pole_pos = RewTerm(
func=mdp.joint_pos_target_l2,
weight=-1.0,
params={"asset_cfg": SceneEntityCfg("robot", joint_names=["cart_to_pole"]), "target": 0.0},
)
# (4) Shaping tasks: lower cart velocity
cart_vel = RewTerm(
func=mdp.joint_vel_l1,
weight=-0.01,
params={"asset_cfg": SceneEntityCfg("robot", joint_names=["slider_to_cart"])},
)
# (5) Shaping tasks: lower pole angular velocity
pole_vel = RewTerm(
func=mdp.joint_vel_l1,
weight=-0.005,
params={"asset_cfg": SceneEntityCfg("robot", joint_names=["cart_to_pole"])},
)
Defining termination criteria#
Most learning tasks happen over a finite number of steps that we call an episode. For instance, in the cartpole task, we want the agent to balance the pole for as long as possible. However, if the agent reaches an unstable or unsafe state, we want to terminate the episode. On the other hand, if the agent is able to balance the pole for a long time, we want to terminate the episode and start a new one so that the agent can learn to balance the pole from a different starting configuration.
The managers.TerminationsCfg
configures what constitutes for an episode to terminate. In this example,
we want the task to terminate when either of the following conditions is met:
Episode Length The episode length is greater than the defined max_episode_length
Cart out of bounds The cart goes outside of the bounds [-3, 3]
The flag managers.TerminationsCfg.time_out
specifies whether the term is a time-out (truncation) term
or terminated term. These are used to indicate the two types of terminations as described in Gymnasium’s documentation.
@configclass
class TerminationsCfg:
"""Termination terms for the MDP."""
# (1) Time out
time_out = DoneTerm(func=mdp.time_out, time_out=True)
# (2) Cart out of bounds
cart_out_of_bounds = DoneTerm(
func=mdp.joint_pos_out_of_manual_limit,
params={"asset_cfg": SceneEntityCfg("robot", joint_names=["slider_to_cart"]), "bounds": (-3.0, 3.0)},
)
Defining commands#
For various goal-conditioned tasks, it is useful to specify the goals or commands for the agent. These are
handled through the managers.CommandManager
. The command manager handles resampling and updating the
commands at each step. It can also be used to provide the commands as an observation to the agent.
For this simple task, we do not use any commands. Hence, we leave this attribute as its default value, which is None. You can see an example of how to define a command manager in the other locomotion or manipulation tasks.
Defining curriculum#
Often times when training a learning agent, it helps to start with a simple task and gradually increase the
tasks’s difficulty as the agent training progresses. This is the idea behind curriculum learning. In Isaac Lab,
we provide a managers.CurriculumManager
class that can be used to define a curriculum for your environment.
In this tutorial we don’t implement a curriculum for simplicity, but you can see an example of a curriculum definition in the other locomotion or manipulation tasks.
Tying it all together#
With all the above components defined, we can now create the ManagerBasedRLEnvCfg
configuration for the
cartpole environment. This is similar to the ManagerBasedEnvCfg
defined in Creating a Manager-Based Base Environment,
only with the added RL components explained in the above sections.
@configclass
class CartpoleEnvCfg(ManagerBasedRLEnvCfg):
"""Configuration for the cartpole environment."""
# Scene settings
scene: CartpoleSceneCfg = CartpoleSceneCfg(num_envs=4096, env_spacing=4.0)
# Basic settings
observations: ObservationsCfg = ObservationsCfg()
actions: ActionsCfg = ActionsCfg()
events: EventCfg = EventCfg()
# MDP settings
rewards: RewardsCfg = RewardsCfg()
terminations: TerminationsCfg = TerminationsCfg()
# Post initialization
def __post_init__(self) -> None:
"""Post initialization."""
# general settings
self.decimation = 2
self.episode_length_s = 5
# viewer settings
self.viewer.eye = (8.0, 0.0, 5.0)
# simulation settings
self.sim.dt = 1 / 120
self.sim.render_interval = self.decimation
Running the simulation loop#
Coming back to the run_cartpole_rl_env.py
script, the simulation loop is similar to the previous tutorial.
The only difference is that we create an instance of envs.ManagerBasedRLEnv
instead of the
envs.ManagerBasedEnv
. Consequently, now the envs.ManagerBasedRLEnv.step()
method returns additional signals
such as the reward and termination status. The information dictionary also maintains logging of quantities
such as the reward contribution from individual terms, the termination status of each term, the episode length etc.
def main():
"""Main function."""
# create environment configuration
env_cfg = CartpoleEnvCfg()
env_cfg.scene.num_envs = args_cli.num_envs
env_cfg.sim.device = args_cli.device
# setup RL environment
env = ManagerBasedRLEnv(cfg=env_cfg)
# simulate physics
count = 0
while simulation_app.is_running():
with torch.inference_mode():
# reset
if count % 300 == 0:
count = 0
env.reset()
print("-" * 80)
print("[INFO]: Resetting environment...")
# sample random actions
joint_efforts = torch.randn_like(env.action_manager.action)
# step the environment
obs, rew, terminated, truncated, info = env.step(joint_efforts)
# print current orientation of pole
print("[Env 0]: Pole joint: ", obs["policy"][0][1].item())
# update counter
count += 1
# close the environment
env.close()
The Code Execution#
Similar to the previous tutorial, we can run the environment by executing the run_cartpole_rl_env.py
script.
./isaaclab.sh -p scripts/tutorials/03_envs/run_cartpole_rl_env.py --num_envs 32
This should open a similar simulation as in the previous tutorial. However, this time, the environment returns more signals that specify the reward and termination status. Additionally, the individual environments reset themselves when they terminate based on the termination criteria specified in the configuration.

To stop the simulation, you can either close the window, or press Ctrl+C
in the terminal
where you started the simulation.
In this tutorial, we learnt how to create a task environment for reinforcement learning. We do this
by extending the base environment to include the rewards, terminations, commands and curriculum terms.
We also learnt how to use the envs.ManagerBasedRLEnv
class to run the environment and receive various
signals from it.
While it is possible to manually create an instance of envs.ManagerBasedRLEnv
class for a desired task,
this is not scalable as it requires specialized scripts for each task. Thus, we exploit the
gymnasium.make()
function to create the environment with the gym interface. We will learn how to do this
in the next tutorial.