Find How Many/What Cameras You Should Train With#

Currently in Isaac Lab, there are several camera types; USD Cameras (standard), Tiled Cameras, and Ray Caster cameras. These camera types differ in functionality and performance. The benchmark_cameras.py script can be used to understand the difference in cameras types, as well to characterize their relative performance at different parameters such as camera quantity, image dimensions, and data types.

This utility is provided so that one easily can find the camera type/parameters that are the most performant while meeting the requirements of the user’s scenario. This utility also helps estimate the maximum number of cameras one can realistically run, assuming that one wants to maximize the number of environments while minimizing step time.

This utility can inject cameras into an existing task from the gym registry, which can be useful for benchmarking cameras in a specific scenario. Also, if you install pynvml, you can let this utility automatically find the maximum numbers of cameras that can run in your task environment up to a certain specified system resource utilization threshold (without training; taking zero actions at each timestep).

This guide accompanies the benchmark_cameras.py script in the scripts/benchmarks directory.

Code for benchmark_cameras.py
  1# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
  2# All rights reserved.
  3#
  4# SPDX-License-Identifier: BSD-3-Clause
  5
  6# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
  7# All rights reserved.
  8#
  9# SPDX-License-Identifier: BSD-3-Clause
 10
 11"""
 12This script might help you determine how many cameras your system can realistically run
 13at different desired settings.
 14
 15You can supply different task environments to inject cameras into, or just test a sample scene.
 16Additionally, you can automatically find the maximum amount of cameras you can run a task with
 17through the auto-tune functionality.
 18
 19.. code-block:: bash
 20
 21    # Usage with GUI
 22    ./isaaclab.sh -p scripts/benchmarks/benchmark_cameras.py -h
 23
 24    # Usage with headless
 25    ./isaaclab.sh -p scripts/benchmarks/benchmark_cameras.py -h --headless
 26
 27"""
 28
 29"""Launch Isaac Sim Simulator first."""
 30
 31import argparse
 32from collections.abc import Callable
 33
 34from isaaclab.app import AppLauncher
 35
 36# parse the arguments
 37args_cli = argparse.Namespace()
 38
 39parser = argparse.ArgumentParser(description="This script can help you benchmark how many cameras you could run.")
 40
 41"""
 42The following arguments only need to be supplied for when one wishes
 43to try injecting cameras into their environment, and automatically determining
 44the maximum camera count.
 45"""
 46parser.add_argument(
 47    "--task",
 48    type=str,
 49    default=None,
 50    required=False,
 51    help="Supply this argument to spawn cameras within an known manager-based task environment.",
 52)
 53
 54parser.add_argument(
 55    "--autotune",
 56    default=False,
 57    action="store_true",
 58    help=(
 59        "Autotuning is only supported for provided task environments."
 60        " Supply this argument to increase the number of environments until a desired threshold is reached."
 61        "Install pynvml in your environment; ./isaaclab.sh -m pip install pynvml"
 62    ),
 63)
 64
 65parser.add_argument(
 66    "--task_num_cameras_per_env",
 67    type=int,
 68    default=1,
 69    help="The number of cameras per environment to use when using a known task.",
 70)
 71
 72parser.add_argument(
 73    "--use_fabric", action="store_true", default=False, help="Enable fabric and use USD I/O operations."
 74)
 75
 76parser.add_argument(
 77    "--autotune_max_percentage_util",
 78    nargs="+",
 79    type=float,
 80    default=[100.0, 80.0, 80.0, 80.0],
 81    required=False,
 82    help=(
 83        "The system utilization percentage thresholds to reach before an autotune is finished. "
 84        "If any one of these limits are hit, the autotune stops."
 85        "Thresholds are, in order, maximum CPU percentage utilization,"
 86        "maximum RAM percentage utilization, maximum GPU compute percent utilization, "
 87        "amd maximum GPU memory utilization."
 88    ),
 89)
 90
 91parser.add_argument(
 92    "--autotune_max_camera_count", type=int, default=4096, help="The maximum amount of cameras allowed in an autotune."
 93)
 94
 95parser.add_argument(
 96    "--autotune_camera_count_interval",
 97    type=int,
 98    default=25,
 99    help=(
100        "The number of cameras to try to add to the environment if the current camera count"
101        " falls within permitted system resource utilization limits."
102    ),
103)
104
105"""
106The following arguments are shared for when injecting cameras into a task environment,
107as well as when creating cameras independent of a task environment.
108"""
109
110parser.add_argument(
111    "--num_tiled_cameras",
112    type=int,
113    default=0,
114    required=False,
115    help="Number of tiled cameras to create. For autotuning, this is how many cameras to start with.",
116)
117
118parser.add_argument(
119    "--num_standard_cameras",
120    type=int,
121    default=0,
122    required=False,
123    help="Number of standard cameras to create. For autotuning, this is how many cameras to start with.",
124)
125
126parser.add_argument(
127    "--num_ray_caster_cameras",
128    type=int,
129    default=0,
130    required=False,
131    help="Number of ray caster cameras to create. For autotuning, this is how many cameras to start with.",
132)
133
134parser.add_argument(
135    "--tiled_camera_data_types",
136    nargs="+",
137    type=str,
138    default=["rgb", "depth"],
139    help="The data types rendered by the tiled camera",
140)
141
142parser.add_argument(
143    "--standard_camera_data_types",
144    nargs="+",
145    type=str,
146    default=["rgb", "distance_to_image_plane", "distance_to_camera"],
147    help="The data types rendered by the standard camera",
148)
149
150parser.add_argument(
151    "--ray_caster_camera_data_types",
152    nargs="+",
153    type=str,
154    default=["distance_to_image_plane"],
155    help="The data types rendered by the ray caster camera.",
156)
157
158parser.add_argument(
159    "--ray_caster_visible_mesh_prim_paths",
160    nargs="+",
161    type=str,
162    default=["/World/ground"],
163    help="WARNING: Ray Caster can currently only cast against a single, static, object",
164)
165
166parser.add_argument(
167    "--convert_depth_to_camera_to_image_plane",
168    action="store_true",
169    default=True,
170    help=(
171        "Enable undistorting from perspective view (distance to camera data_type)"
172        "to orthogonal view (distance to plane data_type) for depth."
173        "This is currently needed to create undisorted depth images/point cloud."
174    ),
175)
176
177parser.add_argument(
178    "--keep_raw_depth",
179    dest="convert_depth_to_camera_to_image_plane",
180    action="store_false",
181    help=(
182        "Disable undistorting from perspective view (distance to camera)"
183        "to orthogonal view (distance to plane data_type) for depth."
184    ),
185)
186
187parser.add_argument(
188    "--height",
189    type=int,
190    default=120,
191    required=False,
192    help="Height in pixels of cameras",
193)
194
195parser.add_argument(
196    "--width",
197    type=int,
198    default=140,
199    required=False,
200    help="Width in pixels of cameras",
201)
202
203parser.add_argument(
204    "--warm_start_length",
205    type=int,
206    default=3,
207    required=False,
208    help=(
209        "Number of steps to run the sim before starting benchmark."
210        "Needed to avoid blank images at the start of the simulation."
211    ),
212)
213
214parser.add_argument(
215    "--experiment_length",
216    type=int,
217    default=15,
218    required=False,
219    help="Number of steps to average over",
220)
221
222# This argument is only used when a task is not provided.
223parser.add_argument(
224    "--num_objects",
225    type=int,
226    default=10,
227    required=False,
228    help="Number of objects to spawn into the scene when not using a known task.",
229)
230
231
232AppLauncher.add_app_launcher_args(parser)
233args_cli = parser.parse_args()
234args_cli.enable_cameras = True
235
236if args_cli.autotune:
237    import pynvml
238
239if len(args_cli.ray_caster_visible_mesh_prim_paths) > 1:
240    print("[WARNING]: Ray Casting is only currently supported for a single, static object")
241# launch omniverse app
242app_launcher = AppLauncher(args_cli)
243simulation_app = app_launcher.app
244
245"""Rest everything follows."""
246
247import gymnasium as gym
248import numpy as np
249import random
250import time
251import torch
252
253import isaacsim.core.utils.prims as prim_utils
254import psutil
255from isaacsim.core.utils.stage import create_new_stage
256
257import isaaclab.sim as sim_utils
258from isaaclab.assets import RigidObject, RigidObjectCfg
259from isaaclab.scene.interactive_scene import InteractiveScene
260from isaaclab.sensors import (
261    Camera,
262    CameraCfg,
263    RayCasterCamera,
264    RayCasterCameraCfg,
265    TiledCamera,
266    TiledCameraCfg,
267    patterns,
268)
269from isaaclab.utils.math import orthogonalize_perspective_depth, unproject_depth
270
271from isaaclab_tasks.utils import load_cfg_from_registry
272
273"""
274Camera Creation
275"""
276
277
278def create_camera_base(
279    camera_cfg: type[CameraCfg | TiledCameraCfg],
280    num_cams: int,
281    data_types: list[str],
282    height: int,
283    width: int,
284    prim_path: str | None = None,
285    instantiate: bool = True,
286) -> Camera | TiledCamera | CameraCfg | TiledCameraCfg | None:
287    """Generalized function to create a camera or tiled camera sensor."""
288    # Determine prim prefix based on the camera class
289    name = camera_cfg.class_type.__name__
290
291    if instantiate:
292        # Create the necessary prims
293        for idx in range(num_cams):
294            prim_utils.create_prim(f"/World/{name}_{idx:02d}", "Xform")
295    if prim_path is None:
296        prim_path = f"/World/{name}_.*/{name}"
297    # If valid camera settings are provided, create the camera
298    if num_cams > 0 and len(data_types) > 0 and height > 0 and width > 0:
299        cfg = camera_cfg(
300            prim_path=prim_path,
301            update_period=0,
302            height=height,
303            width=width,
304            data_types=data_types,
305            spawn=sim_utils.PinholeCameraCfg(
306                focal_length=24, focus_distance=400.0, horizontal_aperture=20.955, clipping_range=(0.1, 1e4)
307            ),
308        )
309        if instantiate:
310            return camera_cfg.class_type(cfg=cfg)
311        else:
312            return cfg
313    else:
314        return None
315
316
317def create_tiled_cameras(
318    num_cams: int = 2, data_types: list[str] | None = None, height: int = 100, width: int = 120
319) -> TiledCamera | None:
320    if data_types is None:
321        data_types = ["rgb", "depth"]
322    """Defines the tiled camera sensor to add to the scene."""
323    return create_camera_base(
324        camera_cfg=TiledCameraCfg,
325        num_cams=num_cams,
326        data_types=data_types,
327        height=height,
328        width=width,
329    )
330
331
332def create_cameras(
333    num_cams: int = 2, data_types: list[str] | None = None, height: int = 100, width: int = 120
334) -> Camera | None:
335    """Defines the Standard cameras."""
336    if data_types is None:
337        data_types = ["rgb", "depth"]
338    return create_camera_base(
339        camera_cfg=CameraCfg, num_cams=num_cams, data_types=data_types, height=height, width=width
340    )
341
342
343def create_ray_caster_cameras(
344    num_cams: int = 2,
345    data_types: list[str] = ["distance_to_image_plane"],
346    mesh_prim_paths: list[str] = ["/World/ground"],
347    height: int = 100,
348    width: int = 120,
349    prim_path: str = "/World/RayCasterCamera_.*/RayCaster",
350    instantiate: bool = True,
351) -> RayCasterCamera | RayCasterCameraCfg | None:
352    """Create the raycaster cameras; different configuration than Standard/Tiled camera"""
353    for idx in range(num_cams):
354        prim_utils.create_prim(f"/World/RayCasterCamera_{idx:02d}/RayCaster", "Xform")
355
356    if num_cams > 0 and len(data_types) > 0 and height > 0 and width > 0:
357        cam_cfg = RayCasterCameraCfg(
358            prim_path=prim_path,
359            mesh_prim_paths=mesh_prim_paths,
360            update_period=0,
361            offset=RayCasterCameraCfg.OffsetCfg(pos=(0.0, 0.0, 0.0), rot=(1.0, 0.0, 0.0, 0.0)),
362            data_types=data_types,
363            debug_vis=False,
364            pattern_cfg=patterns.PinholeCameraPatternCfg(
365                focal_length=24.0,
366                horizontal_aperture=20.955,
367                height=480,
368                width=640,
369            ),
370        )
371        if instantiate:
372            return RayCasterCamera(cfg=cam_cfg)
373        else:
374            return cam_cfg
375
376    else:
377        return None
378
379
380def create_tiled_camera_cfg(prim_path: str) -> TiledCameraCfg:
381    """Grab a simple tiled camera config for injecting into task environments."""
382    return create_camera_base(
383        TiledCameraCfg,
384        num_cams=args_cli.num_tiled_cameras,
385        data_types=args_cli.tiled_camera_data_types,
386        width=args_cli.width,
387        height=args_cli.height,
388        prim_path="{ENV_REGEX_NS}/" + prim_path,
389        instantiate=False,
390    )
391
392
393def create_standard_camera_cfg(prim_path: str) -> CameraCfg:
394    """Grab a simple standard camera config for injecting into task environments."""
395    return create_camera_base(
396        CameraCfg,
397        num_cams=args_cli.num_standard_cameras,
398        data_types=args_cli.standard_camera_data_types,
399        width=args_cli.width,
400        height=args_cli.height,
401        prim_path="{ENV_REGEX_NS}/" + prim_path,
402        instantiate=False,
403    )
404
405
406def create_ray_caster_camera_cfg(prim_path: str) -> RayCasterCameraCfg:
407    """Grab a simple ray caster config for injecting into task environments."""
408    return create_ray_caster_cameras(
409        num_cams=args_cli.num_ray_caster_cameras,
410        data_types=args_cli.ray_caster_camera_data_types,
411        width=args_cli.width,
412        height=args_cli.height,
413        prim_path="{ENV_REGEX_NS}/" + prim_path,
414    )
415
416
417"""
418Scene Creation
419"""
420
421
422def design_scene(
423    num_tiled_cams: int = 2,
424    num_standard_cams: int = 0,
425    num_ray_caster_cams: int = 0,
426    tiled_camera_data_types: list[str] | None = None,
427    standard_camera_data_types: list[str] | None = None,
428    ray_caster_camera_data_types: list[str] | None = None,
429    height: int = 100,
430    width: int = 200,
431    num_objects: int = 20,
432    mesh_prim_paths: list[str] = ["/World/ground"],
433) -> dict:
434    """Design the scene."""
435    if tiled_camera_data_types is None:
436        tiled_camera_data_types = ["rgb"]
437    if standard_camera_data_types is None:
438        standard_camera_data_types = ["rgb"]
439    if ray_caster_camera_data_types is None:
440        ray_caster_camera_data_types = ["distance_to_image_plane"]
441
442    # Populate scene
443    # -- Ground-plane
444    cfg = sim_utils.GroundPlaneCfg()
445    cfg.func("/World/ground", cfg)
446    # -- Lights
447    cfg = sim_utils.DistantLightCfg(intensity=3000.0, color=(0.75, 0.75, 0.75))
448    cfg.func("/World/Light", cfg)
449
450    # Create a dictionary for the scene entities
451    scene_entities = {}
452
453    # Xform to hold objects
454    prim_utils.create_prim("/World/Objects", "Xform")
455    # Random objects
456    for i in range(num_objects):
457        # sample random position
458        position = np.random.rand(3) - np.asarray([0.05, 0.05, -1.0])
459        position *= np.asarray([1.5, 1.5, 0.5])
460        # sample random color
461        color = (random.random(), random.random(), random.random())
462        # choose random prim type
463        prim_type = random.choice(["Cube", "Cone", "Cylinder"])
464        common_properties = {
465            "rigid_props": sim_utils.RigidBodyPropertiesCfg(),
466            "mass_props": sim_utils.MassPropertiesCfg(mass=5.0),
467            "collision_props": sim_utils.CollisionPropertiesCfg(),
468            "visual_material": sim_utils.PreviewSurfaceCfg(diffuse_color=color, metallic=0.5),
469            "semantic_tags": [("class", prim_type)],
470        }
471        if prim_type == "Cube":
472            shape_cfg = sim_utils.CuboidCfg(size=(0.25, 0.25, 0.25), **common_properties)
473        elif prim_type == "Cone":
474            shape_cfg = sim_utils.ConeCfg(radius=0.1, height=0.25, **common_properties)
475        elif prim_type == "Cylinder":
476            shape_cfg = sim_utils.CylinderCfg(radius=0.25, height=0.25, **common_properties)
477        # Rigid Object
478        obj_cfg = RigidObjectCfg(
479            prim_path=f"/World/Objects/Obj_{i:02d}",
480            spawn=shape_cfg,
481            init_state=RigidObjectCfg.InitialStateCfg(pos=position),
482        )
483        scene_entities[f"rigid_object{i}"] = RigidObject(cfg=obj_cfg)
484
485    # Sensors
486    standard_camera = create_cameras(
487        num_cams=num_standard_cams, data_types=standard_camera_data_types, height=height, width=width
488    )
489    tiled_camera = create_tiled_cameras(
490        num_cams=num_tiled_cams, data_types=tiled_camera_data_types, height=height, width=width
491    )
492    ray_caster_camera = create_ray_caster_cameras(
493        num_cams=num_ray_caster_cams,
494        data_types=ray_caster_camera_data_types,
495        mesh_prim_paths=mesh_prim_paths,
496        height=height,
497        width=width,
498    )
499    # return the scene information
500    if tiled_camera is not None:
501        scene_entities["tiled_camera"] = tiled_camera
502    if standard_camera is not None:
503        scene_entities["standard_camera"] = standard_camera
504    if ray_caster_camera is not None:
505        scene_entities["ray_caster_camera"] = ray_caster_camera
506    return scene_entities
507
508
509def inject_cameras_into_task(
510    task: str,
511    num_cams: int,
512    camera_name_prefix: str,
513    camera_creation_callable: Callable,
514    num_cameras_per_env: int = 1,
515) -> gym.Env:
516    """Loads the task, sticks cameras into the config, and creates the environment."""
517    cfg = load_cfg_from_registry(task, "env_cfg_entry_point")
518    cfg.sim.device = args_cli.device
519    cfg.sim.use_fabric = args_cli.use_fabric
520    scene_cfg = cfg.scene
521
522    num_envs = int(num_cams / num_cameras_per_env)
523    scene_cfg.num_envs = num_envs
524
525    for idx in range(num_cameras_per_env):
526        suffix = "" if idx == 0 else str(idx)
527        name = camera_name_prefix + suffix
528        setattr(scene_cfg, name, camera_creation_callable(name))
529    cfg.scene = scene_cfg
530    env = gym.make(task, cfg=cfg)
531    return env
532
533
534"""
535System diagnosis
536"""
537
538
539def get_utilization_percentages(reset: bool = False, max_values: list[float] = [0.0, 0.0, 0.0, 0.0]) -> list[float]:
540    """Get the maximum CPU, RAM, GPU utilization (processing), and
541    GPU memory usage percentages since the last time reset was true."""
542    if reset:
543        max_values[:] = [0, 0, 0, 0]  # Reset the max values
544
545    # CPU utilization
546    cpu_usage = psutil.cpu_percent(interval=0.1)
547    max_values[0] = max(max_values[0], cpu_usage)
548
549    # RAM utilization
550    memory_info = psutil.virtual_memory()
551    ram_usage = memory_info.percent
552    max_values[1] = max(max_values[1], ram_usage)
553
554    # GPU utilization using pynvml
555    if torch.cuda.is_available():
556
557        if args_cli.autotune:
558            pynvml.nvmlInit()  # Initialize NVML
559            for i in range(torch.cuda.device_count()):
560                handle = pynvml.nvmlDeviceGetHandleByIndex(i)
561
562                # GPU Utilization
563                gpu_utilization = pynvml.nvmlDeviceGetUtilizationRates(handle)
564                gpu_processing_utilization_percent = gpu_utilization.gpu  # GPU core utilization
565                max_values[2] = max(max_values[2], gpu_processing_utilization_percent)
566
567                # GPU Memory Usage
568                memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
569                gpu_memory_total = memory_info.total
570                gpu_memory_used = memory_info.used
571                gpu_memory_utilization_percent = (gpu_memory_used / gpu_memory_total) * 100
572                max_values[3] = max(max_values[3], gpu_memory_utilization_percent)
573
574            pynvml.nvmlShutdown()  # Shutdown NVML after usage
575    else:
576        gpu_processing_utilization_percent = None
577        gpu_memory_utilization_percent = None
578    return max_values
579
580
581"""
582Experiment
583"""
584
585
586def run_simulator(
587    sim: sim_utils.SimulationContext | None,
588    scene_entities: dict | InteractiveScene,
589    warm_start_length: int = 10,
590    experiment_length: int = 100,
591    tiled_camera_data_types: list[str] | None = None,
592    standard_camera_data_types: list[str] | None = None,
593    ray_caster_camera_data_types: list[str] | None = None,
594    depth_predicate: Callable = lambda x: "to" in x or x == "depth",
595    perspective_depth_predicate: Callable = lambda x: x == "distance_to_camera",
596    convert_depth_to_camera_to_image_plane: bool = True,
597    max_cameras_per_env: int = 1,
598    env: gym.Env | None = None,
599) -> dict:
600    """Run the simulator with all cameras, and return timing analytics. Visualize if desired."""
601
602    if tiled_camera_data_types is None:
603        tiled_camera_data_types = ["rgb"]
604    if standard_camera_data_types is None:
605        standard_camera_data_types = ["rgb"]
606    if ray_caster_camera_data_types is None:
607        ray_caster_camera_data_types = ["distance_to_image_plane"]
608
609    # Initialize camera lists
610    tiled_cameras = []
611    standard_cameras = []
612    ray_caster_cameras = []
613
614    # Dynamically extract cameras from the scene entities up to max_cameras_per_env
615    for i in range(max_cameras_per_env):
616        # Extract tiled cameras
617        tiled_camera_key = f"tiled_camera{i}" if i > 0 else "tiled_camera"
618        standard_camera_key = f"standard_camera{i}" if i > 0 else "standard_camera"
619        ray_caster_camera_key = f"ray_caster_camera{i}" if i > 0 else "ray_caster_camera"
620
621        try:  # if instead you checked ... if key is in scene_entities... # errors out always even if key present
622            tiled_cameras.append(scene_entities[tiled_camera_key])
623            standard_cameras.append(scene_entities[standard_camera_key])
624            ray_caster_cameras.append(scene_entities[ray_caster_camera_key])
625        except KeyError:
626            break
627
628    # Initialize camera counts
629    camera_lists = [tiled_cameras, standard_cameras, ray_caster_cameras]
630    camera_data_types = [tiled_camera_data_types, standard_camera_data_types, ray_caster_camera_data_types]
631    labels = ["tiled", "standard", "ray_caster"]
632
633    if sim is not None:
634        # Set camera world poses
635        for camera_list in camera_lists:
636            for camera in camera_list:
637                num_cameras = camera.data.intrinsic_matrices.size(0)
638                positions = torch.tensor([[2.5, 2.5, 2.5]], device=sim.device).repeat(num_cameras, 1)
639                targets = torch.tensor([[0.0, 0.0, 0.0]], device=sim.device).repeat(num_cameras, 1)
640                camera.set_world_poses_from_view(positions, targets)
641
642    # Initialize timing variables
643    timestep = 0
644    total_time = 0.0
645    valid_timesteps = 0
646    sim_step_time = 0.0
647
648    while simulation_app.is_running() and timestep < experiment_length:
649        print(f"On timestep {timestep} of {experiment_length}, with warm start of {warm_start_length}")
650        get_utilization_percentages()
651
652        # Measure the total simulation step time
653        step_start_time = time.time()
654
655        if sim is not None:
656            sim.step()
657
658        if env is not None:
659            with torch.inference_mode():
660                # compute zero actions
661                actions = torch.zeros(env.action_space.shape, device=env.unwrapped.device)
662                # apply actions
663                env.step(actions)
664
665        # Update cameras and process vision data within the simulation step
666        clouds = {}
667        images = {}
668        depth_images = {}
669
670        # Loop through all camera lists and their data_types
671        for camera_list, data_types, label in zip(camera_lists, camera_data_types, labels):
672            for cam_idx, camera in enumerate(camera_list):
673
674                if env is None:  # No env, need to step cams manually
675                    # Only update the camera if it hasn't been updated as part of scene_entities.update ...
676                    camera.update(dt=sim.get_physics_dt())
677
678                for data_type in data_types:
679                    data_label = f"{label}_{cam_idx}_{data_type}"
680
681                    if depth_predicate(data_type):  # is a depth image, want to create cloud
682                        depth = camera.data.output[data_type]
683                        depth_images[data_label + "_raw"] = depth
684                        if perspective_depth_predicate(data_type) and convert_depth_to_camera_to_image_plane:
685                            depth = orthogonalize_perspective_depth(
686                                camera.data.output[data_type], camera.data.intrinsic_matrices
687                            )
688                            depth_images[data_label + "_undistorted"] = depth
689
690                        pointcloud = unproject_depth(depth=depth, intrinsics=camera.data.intrinsic_matrices)
691                        clouds[data_label] = pointcloud
692                    else:  # rgb image, just save it
693                        image = camera.data.output[data_type]
694                        images[data_label] = image
695
696        # End timing for the step
697        step_end_time = time.time()
698        sim_step_time += step_end_time - step_start_time
699
700        if timestep > warm_start_length:
701            get_utilization_percentages(reset=True)
702            total_time += step_end_time - step_start_time
703            valid_timesteps += 1
704
705        timestep += 1
706
707    # Calculate average timings
708    if valid_timesteps > 0:
709        avg_timestep_duration = total_time / valid_timesteps
710        avg_sim_step_duration = sim_step_time / experiment_length
711    else:
712        avg_timestep_duration = 0.0
713        avg_sim_step_duration = 0.0
714
715    # Package timing analytics in a dictionary
716    timing_analytics = {
717        "average_timestep_duration": avg_timestep_duration,
718        "average_sim_step_duration": avg_sim_step_duration,
719        "total_simulation_time": sim_step_time,
720        "total_experiment_duration": sim_step_time,
721    }
722
723    system_utilization_analytics = get_utilization_percentages()
724
725    print("--- Benchmark Results ---")
726    print(f"Average timestep duration: {avg_timestep_duration:.6f} seconds")
727    print(f"Average simulation step duration: {avg_sim_step_duration:.6f} seconds")
728    print(f"Total simulation time: {sim_step_time:.6f} seconds")
729    print("\nSystem Utilization Statistics:")
730    print(
731        f"| CPU:{system_utilization_analytics[0]}% | "
732        f"RAM:{system_utilization_analytics[1]}% | "
733        f"GPU Compute:{system_utilization_analytics[2]}% | "
734        f" GPU Memory: {system_utilization_analytics[3]:.2f}% |"
735    )
736
737    return {"timing_analytics": timing_analytics, "system_utilization_analytics": system_utilization_analytics}
738
739
740def main():
741    """Main function."""
742    # Load simulation context
743    if args_cli.num_tiled_cameras + args_cli.num_standard_cameras + args_cli.num_ray_caster_cameras <= 0:
744        raise ValueError("You must select at least one camera.")
745    if (
746        (args_cli.num_tiled_cameras > 0 and args_cli.num_standard_cameras > 0)
747        or (args_cli.num_ray_caster_cameras > 0 and args_cli.num_standard_cameras > 0)
748        or (args_cli.num_ray_caster_cameras > 0 and args_cli.num_tiled_cameras > 0)
749    ):
750        print("[WARNING]: You have elected to use more than one camera type.")
751        print("[WARNING]: For a benchmark to be meaningful, use ONLY ONE camera type at a time.")
752        print(
753            "[WARNING]: For example, if num_tiled_cameras=100, for a meaningful benchmark,"
754            "num_standard_cameras should be 0, and num_ray_caster_cameras should be 0"
755        )
756        raise ValueError("Benchmark one camera at a time.")
757
758    print("[INFO]: Designing the scene")
759    if args_cli.task is None:
760        print("[INFO]: No task environment provided, creating random scene.")
761        sim_cfg = sim_utils.SimulationCfg(device=args_cli.device)
762        sim = sim_utils.SimulationContext(sim_cfg)
763        # Set main camera
764        sim.set_camera_view([2.5, 2.5, 2.5], [0.0, 0.0, 0.0])
765        scene_entities = design_scene(
766            num_tiled_cams=args_cli.num_tiled_cameras,
767            num_standard_cams=args_cli.num_standard_cameras,
768            num_ray_caster_cams=args_cli.num_ray_caster_cameras,
769            tiled_camera_data_types=args_cli.tiled_camera_data_types,
770            standard_camera_data_types=args_cli.standard_camera_data_types,
771            ray_caster_camera_data_types=args_cli.ray_caster_camera_data_types,
772            height=args_cli.height,
773            width=args_cli.width,
774            num_objects=args_cli.num_objects,
775            mesh_prim_paths=args_cli.ray_caster_visible_mesh_prim_paths,
776        )
777        # Play simulator
778        sim.reset()
779        # Now we are ready!
780        print("[INFO]: Setup complete...")
781        # Run simulator
782        run_simulator(
783            sim=sim,
784            scene_entities=scene_entities,
785            warm_start_length=args_cli.warm_start_length,
786            experiment_length=args_cli.experiment_length,
787            tiled_camera_data_types=args_cli.tiled_camera_data_types,
788            standard_camera_data_types=args_cli.standard_camera_data_types,
789            ray_caster_camera_data_types=args_cli.ray_caster_camera_data_types,
790            convert_depth_to_camera_to_image_plane=args_cli.convert_depth_to_camera_to_image_plane,
791        )
792    else:
793        print("[INFO]: Using known task environment, injecting cameras.")
794        autotune_iter = 0
795        max_sys_util_thresh = [0.0, 0.0, 0.0]
796        max_num_cams = max(args_cli.num_tiled_cameras, args_cli.num_standard_cameras, args_cli.num_ray_caster_cameras)
797        cur_num_cams = max_num_cams
798        cur_sys_util = max_sys_util_thresh
799        interval = args_cli.autotune_camera_count_interval
800
801        if args_cli.autotune:
802            max_sys_util_thresh = args_cli.autotune_max_percentage_util
803            max_num_cams = args_cli.autotune_max_camera_count
804            print("[INFO]: Auto tuning until any of the following threshold are met")
805            print(f"|CPU: {max_sys_util_thresh[0]}% | RAM {max_sys_util_thresh[1]}% | GPU: {max_sys_util_thresh[2]}% |")
806            print(f"[INFO]: Maximum number of cameras allowed: {max_num_cams}")
807        # Determine which camera is being tested...
808        tiled_camera_cfg = create_tiled_camera_cfg("tiled_camera")
809        standard_camera_cfg = create_standard_camera_cfg("standard_camera")
810        ray_caster_camera_cfg = create_ray_caster_camera_cfg("ray_caster_camera")
811        camera_name_prefix = ""
812        camera_creation_callable = None
813        num_cams = 0
814        if tiled_camera_cfg is not None:
815            camera_name_prefix = "tiled_camera"
816            camera_creation_callable = create_tiled_camera_cfg
817            num_cams = args_cli.num_tiled_cameras
818        elif standard_camera_cfg is not None:
819            camera_name_prefix = "standard_camera"
820            camera_creation_callable = create_standard_camera_cfg
821            num_cams = args_cli.num_standard_cameras
822        elif ray_caster_camera_cfg is not None:
823            camera_name_prefix = "ray_caster_camera"
824            camera_creation_callable = create_ray_caster_camera_cfg
825            num_cams = args_cli.num_ray_caster_cameras
826
827        while (
828            all(cur <= max_thresh for cur, max_thresh in zip(cur_sys_util, max_sys_util_thresh))
829            and cur_num_cams <= max_num_cams
830        ):
831            cur_num_cams = num_cams + interval * autotune_iter
832            autotune_iter += 1
833
834            env = inject_cameras_into_task(
835                task=args_cli.task,
836                num_cams=cur_num_cams,
837                camera_name_prefix=camera_name_prefix,
838                camera_creation_callable=camera_creation_callable,
839                num_cameras_per_env=args_cli.task_num_cameras_per_env,
840            )
841            env.reset()
842            print(f"Testing with {cur_num_cams} {camera_name_prefix}")
843            analysis = run_simulator(
844                sim=None,
845                scene_entities=env.unwrapped.scene,
846                warm_start_length=args_cli.warm_start_length,
847                experiment_length=args_cli.experiment_length,
848                tiled_camera_data_types=args_cli.tiled_camera_data_types,
849                standard_camera_data_types=args_cli.standard_camera_data_types,
850                ray_caster_camera_data_types=args_cli.ray_caster_camera_data_types,
851                convert_depth_to_camera_to_image_plane=args_cli.convert_depth_to_camera_to_image_plane,
852                max_cameras_per_env=args_cli.task_num_cameras_per_env,
853                env=env,
854            )
855
856            cur_sys_util = analysis["system_utilization_analytics"]
857            print("Triggering reset...")
858            env.close()
859            create_new_stage()
860        print("[INFO]: DONE! Feel free to CTRL + C Me ")
861        print(f"[INFO]: If you've made it this far, you can likely simulate {cur_num_cams} {camera_name_prefix}")
862        print("Keep in mind, this is without any training running on the GPU.")
863        print("Set lower utilization thresholds to account for training.")
864
865        if not args_cli.autotune:
866            print("[WARNING]: GPU Util Statistics only correct while autotuning, ignore above.")
867
868
869if __name__ == "__main__":
870    # run the main function
871    main()
872    # close sim app
873    simulation_app.close()

Possible Parameters#

First, run

./isaaclab.sh -p scripts/benchmarks/benchmark_cameras.py -h

to see all possible parameters you can vary with this utility.

See the command line parameters related to autotune for more information about automatically determining maximum camera count.

Compare Performance in Task Environments and Automatically Determine Task Max Camera Count#

Currently, tiled cameras are the most performant camera that can handle multiple dynamic objects.

For example, to see how your system could handle 100 tiled cameras in the cartpole environment, with 2 cameras per environment (so 50 environments total) only in RGB mode, run

./isaaclab.sh -p scripts/benchmarks/benchmark_cameras.py \
--task Isaac-Cartpole-v0 --num_tiled_cameras 100 \
--task_num_cameras_per_env 2 \
--tiled_camera_data_types rgb

If you have pynvml installed, (./isaaclab.sh -p -m pip install pynvml), you can also find the maximum number of cameras that you could run in the specified environment up to a certain performance threshold (specified by max CPU utilization percent, max RAM utilization percent, max GPU compute percent, and max GPU memory percent). For example, to find the maximum number of cameras you can run with cartpole, you could run:

./isaaclab.sh -p scripts/benchmarks/benchmark_cameras.py \
--task Isaac-Cartpole-v0 --num_tiled_cameras 100 \
--task_num_cameras_per_env 2 \
--tiled_camera_data_types rgb --autotune \
--autotune_max_percentage_util 100 80 50 50

Autotune may lead to the program crashing, which means that it tried to run too many cameras at once. However, the max percentage utilization parameter is meant to prevent this from happening.

The output of the benchmark doesn’t include the overhead of training the network, so consider decreasing the maximum utilization percentages to account for this overhead. The final output camera count is for all cameras, so to get the total number of environments, divide the output camera count by the number of cameras per environment.

Compare Camera Type and Performance (Without a Specified Task)#

This tool can also asses performance without a task environment. For example, to view 100 random objects with 2 standard cameras, one could run

./isaaclab.sh -p scripts/benchmarks/benchmark_cameras.py \
--height 100 --width 100 --num_standard_cameras 2 \
--standard_camera_data_types instance_segmentation_fast normals --num_objects 100 \
--experiment_length 100

If your system cannot handle this due to performance reasons, then the process will be killed. It’s recommended to monitor CPU/RAM utilization and GPU utilization while running this script, to get an idea of how many resources rendering the desired camera requires. In Ubuntu, you can use tools like htop and nvtop to live monitor resources while running this script, and in Windows, you can use the Task Manager.

If your system has a hard time handling the desired cameras, you can try the following

  • Switch to headless mode (supply --headless)

  • Ensure you are using the GPU pipeline not CPU!

  • If you aren’t using Tiled Cameras, switch to Tiled Cameras

  • Decrease camera resolution

  • Decrease how many data_types there are for each camera.

  • Decrease the number of cameras

  • Decrease the number of objects in the scene

If your system is able to handle the amount of cameras, then the time statistics will be printed to the terminal. After the simulations stops it can be closed with CTRL+C.