Policy Inference in USD Environment

Policy Inference in USD Environment#

Having learnt how to modify a task in Modifying an existing Direct RL Environment, we will now look at how to run a trained policy in a prebuilt USD scene.

In this tutorial, we will use the RSL RL library and the trained policy from the Humanoid Rough Terrain Isaac-Velocity-Rough-H1-v0 task in a simple warehouse USD.

The Tutorial Code#

For this tutorial, we use the trained policy’s checkpoint exported as jit (which is an offline version of the policy).

The H1RoughEnvCfg_PLAY cfg encapsulates the configuration values of the inference environment, including the assets to be instantiated.

In order to use a prebuilt USD environment instead of the terrain generator specified, we make the following changes to the config before passing it to the ManagerBasedRLEnv.

Code for policy_inference_in_usd.py
 1# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
 2# All rights reserved.
 3#
 4# SPDX-License-Identifier: BSD-3-Clause
 5
 6# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
 7# All rights reserved.
 8#
 9# SPDX-License-Identifier: BSD-3-Clause
10
11"""
12This script demonstrates policy inference in a prebuilt USD environment.
13
14In this example, we use a locomotion policy to control the H1 robot. The robot was trained
15using Isaac-Velocity-Rough-H1-v0. The robot is commanded to move forward at a constant velocity.
16
17.. code-block:: bash
18
19    # Run the script
20    ./isaaclab.sh -p scripts/tutorials/03_envs/policy_inference_in_usd.py --checkpoint /path/to/jit/checkpoint.pt
21
22"""
23
24"""Launch Isaac Sim Simulator first."""
25
26
27import argparse
28
29from isaaclab.app import AppLauncher
30
31# add argparse arguments
32parser = argparse.ArgumentParser(description="Tutorial on inferencing a policy on an H1 robot in a warehouse.")
33parser.add_argument("--checkpoint", type=str, help="Path to model checkpoint exported as jit.", required=True)
34
35# append AppLauncher cli args
36AppLauncher.add_app_launcher_args(parser)
37# parse the arguments
38args_cli = parser.parse_args()
39
40# launch omniverse app
41app_launcher = AppLauncher(args_cli)
42simulation_app = app_launcher.app
43
44"""Rest everything follows."""
45import io
46import os
47import torch
48
49import omni
50
51from isaaclab.envs import ManagerBasedRLEnv
52from isaaclab.terrains import TerrainImporterCfg
53from isaaclab.utils.assets import ISAAC_NUCLEUS_DIR
54
55from isaaclab_tasks.manager_based.locomotion.velocity.config.h1.rough_env_cfg import H1RoughEnvCfg_PLAY
56
57
58def main():
59    """Main function."""
60    # load the trained jit policy
61    policy_path = os.path.abspath(args_cli.checkpoint)
62    file_content = omni.client.read_file(policy_path)[2]
63    file = io.BytesIO(memoryview(file_content).tobytes())
64    policy = torch.jit.load(file, map_location=args_cli.device)
65
66    # setup environment
67    env_cfg = H1RoughEnvCfg_PLAY()
68    env_cfg.scene.num_envs = 1
69    env_cfg.curriculum = None
70    env_cfg.scene.terrain = TerrainImporterCfg(
71        prim_path="/World/ground",
72        terrain_type="usd",
73        usd_path=f"{ISAAC_NUCLEUS_DIR}/Environments/Simple_Warehouse/warehouse.usd",
74    )
75    env_cfg.sim.device = args_cli.device
76    if args_cli.device == "cpu":
77        env_cfg.sim.use_fabric = False
78
79    # create environment
80    env = ManagerBasedRLEnv(cfg=env_cfg)
81
82    # run inference with the policy
83    obs, _ = env.reset()
84    with torch.inference_mode():
85        while simulation_app.is_running():
86            action = policy(obs["policy"])
87            obs, _, _, _, _ = env.step(action)
88
89
90if __name__ == "__main__":
91    main()
92    simulation_app.close()

Note that we have set the device to CPU and disabled the use of Fabric for inferencing. This is because when simulating a small number of environment, CPU simulation can often perform faster than GPU simulation.

The Code Execution#

First, we need to train the Isaac-Velocity-Rough-H1-v0 task by running the following:

./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Velocity-Rough-H1-v0 --headless

When the training is finished, we can visualize the result with the following command. To stop the simulation, you can either close the window, or press Ctrl+C in the terminal where you started the simulation.

./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/play.py --task Isaac-Velocity-Rough-H1-v0 --num_envs 64 --checkpoint logs/rsl_rl/h1_rough/EXPERIMENT_NAME/POLICY_FILE.pt

After running the play script, the policy will be exported to jit and onnx files under the experiment logs directory. Note that not all learning libraries support exporting the policy to a jit or onnx file. For libraries that don’t currently support this functionality, please refer to the corresponding play.py script for the library to learn about how to initialize the policy.

We can then load the warehouse asset and run inference on the H1 robot using the exported jit policy.

./isaaclab.sh -p scripts/tutorials/03_envs/policy_inference_in_usd.py --checkpoint logs/rsl_rl/h1_rough/EXPERIMENT_NAME/exported/policy.pt
result of training Isaac-H1-Direct-v0 task

In this tutorial, we learnt how to make minor modifications to an existing environment config to run policy inference in a prebuilt usd environment.