USD 环境中的策略推理

USD 环境中的策略推理#

学习了如何在修改现有的 Direct RL 环境中修改任务后，我们现在将了解如何在预构建的 USD 场景中运行训练好的策略。

在本教程中，我们将使用 RSL RL 库和来自 Humanoid Rough Terrain Isaac-Velocity-Rough-H1-v0 任务的训练策略，在一个简单的仓库 USD 中。

教程代码#

对于本教程，我们使用训练好的策略的检查点，导出为 jit（这是策略的离线版本）。

H1RoughEnvCfg_PLAY 配置封装了推理环境的配置值，包括要实例化的资产。

为了使用预构建的 USD 环境而不是指定的地形生成器，我们在将其传递给 ManagerBasedRLEnv 之前对配置进行以下更改。

policy_inference_in_usd.py的代码

# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause

"""
This script demonstrates policy inference in a prebuilt USD environment.

In this example, we use a locomotion policy to control the H1 robot. The robot was trained
using Isaac-Velocity-Rough-H1-v0. The robot is commanded to move forward at a constant velocity.

.. code-block:: bash

    # Run the script
    ./isaaclab.sh -p scripts/tutorials/03_envs/policy_inference_in_usd.py --checkpoint /path/to/jit/checkpoint.pt

"""

"""Launch Isaac Sim Simulator first."""


import argparse

from isaaclab.app import AppLauncher

# add argparse arguments
parser = argparse.ArgumentParser(description="Tutorial on inferencing a policy on an H1 robot in a warehouse.")
parser.add_argument("--checkpoint", type=str, help="Path to model checkpoint exported as jit.", required=True)

# append AppLauncher cli args
AppLauncher.add_app_launcher_args(parser)
# parse the arguments
args_cli = parser.parse_args()

# launch omniverse app
app_launcher = AppLauncher(args_cli)
simulation_app = app_launcher.app

"""Rest everything follows."""
import io
import os
import torch

import omni

from isaaclab.envs import ManagerBasedRLEnv
from isaaclab.terrains import TerrainImporterCfg
from isaaclab.utils.assets import ISAAC_NUCLEUS_DIR

from isaaclab_tasks.manager_based.locomotion.velocity.config.h1.rough_env_cfg import H1RoughEnvCfg_PLAY


def main():
    """Main function."""
    # load the trained jit policy
    policy_path = os.path.abspath(args_cli.checkpoint)
    file_content = omni.client.read_file(policy_path)[2]
    file = io.BytesIO(memoryview(file_content).tobytes())
    policy = torch.jit.load(file, map_location=args_cli.device)

    # setup environment
    env_cfg = H1RoughEnvCfg_PLAY()
    env_cfg.scene.num_envs = 1
    env_cfg.curriculum = None
    env_cfg.scene.terrain = TerrainImporterCfg(
        prim_path="/World/ground",
        terrain_type="usd",
        usd_path=f"{ISAAC_NUCLEUS_DIR}/Environments/Simple_Warehouse/warehouse.usd",
    )
    env_cfg.sim.device = args_cli.device
    if args_cli.device == "cpu":
        env_cfg.sim.use_fabric = False

    # create environment
    env = ManagerBasedRLEnv(cfg=env_cfg)

    # run inference with the policy
    obs, _ = env.reset()
    with torch.inference_mode():
        while simulation_app.is_running():
            action = policy(obs["policy"])
            obs, _, _, _, _ = env.step(action)


if __name__ == "__main__":
    main()
    simulation_app.close()

请注意，我们已将设备设置为 CPU 并禁用了使用 Fabric 进行推理。这是因为在仿真少量环境时，CPU 仿真通常比 GPU 仿真执行得更快。

代码执行#

首先，我们需要通过运行以下命令来训练 Isaac-Velocity-Rough-H1-v0 任务:

./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Velocity-Rough-H1-v0 --headless

当训练完成后，我们可以使用以下命令来可视化结果。要停止仿真，您可以关闭窗口，或者在您启动仿真的终端中按 Ctrl+C 。

./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/play.py --task Isaac-Velocity-Rough-H1-v0 --num_envs 64 --checkpoint logs/rsl_rl/h1_rough/EXPERIMENT_NAME/POLICY_FILE.pt

运行播放脚本后，策略将被导出为 jit 和 onnx 文件，存储在实验日志目录下。请注意，并非所有学习库都支持将策略导出为 jit 或 onnx 文件。对于当前不支持此功能的库，请参考相应的 play.py 脚本，以了解如何初始化策略。

我们可以加载仓库资产，并使用导出的 jit 策略在 H1 机器人上运行推理 (policy.pt 文件在 exported/ 路径下)。

./isaaclab.sh -p scripts/tutorials/03_envs/policy_inference_in_usd.py --checkpoint logs/rsl_rl/h1_rough/EXPERIMENT_NAME/exported/policy.pt

在本教程中，我们学习了如何对现有环境配置进行一些小修改，以在预构建的 usd 环境中运行策略推理。