Policy Inference in USD Environment

Policy Inference in USD Environment#

Having learnt how to modify a task in Modifying an existing Direct RL Environment, we will now look at how to run a trained policy in a prebuilt USD scene.

In this tutorial, we will use the RSL RL library and the trained policy from the Humanoid Rough Terrain Isaac-Velocity-Rough-H1-v0 task in a simple warehouse USD.

The Tutorial Code#

For this tutorial, we use the trained policy’s checkpoint exported as jit (which is an offline version of the policy).

The H1RoughEnvCfg_PLAY cfg encapsulates the configuration values of the inference environment, including the assets to be instantiated.

In order to use a prebuilt USD environment instead of the terrain generator specified, we make the following changes to the config before passing it to the ManagerBasedRLEnv.

Code for policy_inference_in_usd.py

# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause

"""
This script demonstrates policy inference in a prebuilt USD environment.

In this example, we use a locomotion policy to control the H1 robot. The robot was trained
using Isaac-Velocity-Rough-H1-v0. The robot is commanded to move forward at a constant velocity.

.. code-block:: bash

    # Run the script
    ./isaaclab.sh -p scripts/tutorials/03_envs/policy_inference_in_usd.py --checkpoint /path/to/jit/checkpoint.pt

"""

"""Launch Isaac Sim Simulator first."""


import argparse

from isaaclab.app import AppLauncher

# add argparse arguments
parser = argparse.ArgumentParser(description="Tutorial on inferencing a policy on an H1 robot in a warehouse.")
parser.add_argument("--checkpoint", type=str, help="Path to model checkpoint exported as jit.", required=True)

# append AppLauncher cli args
AppLauncher.add_app_launcher_args(parser)
# parse the arguments
args_cli = parser.parse_args()

# launch omniverse app
app_launcher = AppLauncher(args_cli)
simulation_app = app_launcher.app

"""Rest everything follows."""
import io
import os
import torch

import omni

from isaaclab.envs import ManagerBasedRLEnv
from isaaclab.terrains import TerrainImporterCfg
from isaaclab.utils.assets import ISAAC_NUCLEUS_DIR

from isaaclab_tasks.manager_based.locomotion.velocity.config.h1.rough_env_cfg import H1RoughEnvCfg_PLAY


def main():
    """Main function."""
    # load the trained jit policy
    policy_path = os.path.abspath(args_cli.checkpoint)
    file_content = omni.client.read_file(policy_path)[2]
    file = io.BytesIO(memoryview(file_content).tobytes())
    policy = torch.jit.load(file, map_location=args_cli.device)

    # setup environment
    env_cfg = H1RoughEnvCfg_PLAY()
    env_cfg.scene.num_envs = 1
    env_cfg.curriculum = None
    env_cfg.scene.terrain = TerrainImporterCfg(
        prim_path="/World/ground",
        terrain_type="usd",
        usd_path=f"{ISAAC_NUCLEUS_DIR}/Environments/Simple_Warehouse/warehouse.usd",
    )
    env_cfg.sim.device = args_cli.device
    if args_cli.device == "cpu":
        env_cfg.sim.use_fabric = False

    # create environment
    env = ManagerBasedRLEnv(cfg=env_cfg)

    # run inference with the policy
    obs, _ = env.reset()
    with torch.inference_mode():
        while simulation_app.is_running():
            action = policy(obs["policy"])
            obs, _, _, _, _ = env.step(action)


if __name__ == "__main__":
    main()
    simulation_app.close()

Note that we have set the device to CPU and disabled the use of Fabric for inferencing. This is because when simulating a small number of environment, CPU simulation can often perform faster than GPU simulation.

The Code Execution#

First, we need to train the Isaac-Velocity-Rough-H1-v0 task by running the following:

./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Velocity-Rough-H1-v0 --headless

When the training is finished, we can visualize the result with the following command. To stop the simulation, you can either close the window, or press Ctrl+C in the terminal where you started the simulation.

./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/play.py --task Isaac-Velocity-Rough-H1-v0 --num_envs 64 --checkpoint logs/rsl_rl/h1_rough/EXPERIMENT_NAME/POLICY_FILE.pt

After running the play script, the policy will be exported to jit and onnx files under the experiment logs directory. Note that not all learning libraries support exporting the policy to a jit or onnx file. For libraries that don’t currently support this functionality, please refer to the corresponding play.py script for the library to learn about how to initialize the policy.

We can then load the warehouse asset and run inference on the H1 robot using the exported jit policy (policy.pt file in the exported/ directory).

./isaaclab.sh -p scripts/tutorials/03_envs/policy_inference_in_usd.py --checkpoint logs/rsl_rl/h1_rough/EXPERIMENT_NAME/exported/policy.pt

result of training Isaac-H1-Direct-v0 task

In this tutorial, we learnt how to make minor modifications to an existing environment config to run policy inference in a prebuilt usd environment.

Policy Inference in USD Environment

Contents

Policy Inference in USD Environment#

The Tutorial Code#

The Code Execution#