Find How Many/What Cameras You Should Train With#

Currently in Isaac Lab, there are several camera types; USD Cameras (standard), Tiled Cameras, and Ray Caster cameras. These camera types differ in functionality and performance. The benchmark_cameras.py script can be used to understand the difference in cameras types, as well to characterize their relative performance at different parameters such as camera quantity, image dimensions, and data types.

This utility is provided so that one easily can find the camera type/parameters that are the most performant while meeting the requirements of the user’s scenario. This utility also helps estimate the maximum number of cameras one can realistically run, assuming that one wants to maximize the number of environments while minimizing step time.

This utility can inject cameras into an existing task from the gym registry, which can be useful for benchmarking cameras in a specific scenario. Also, if you install pynvml, you can let this utility automatically find the maximum numbers of cameras that can run in your task environment up to a certain specified system resource utilization threshold (without training; taking zero actions at each timestep).

This guide accompanies the benchmark_cameras.py script in the scripts/benchmarks directory.

Code for benchmark_cameras.py
  1# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
  2# All rights reserved.
  3#
  4# SPDX-License-Identifier: BSD-3-Clause
  5
  6"""
  7This script might help you determine how many cameras your system can realistically run
  8at different desired settings.
  9
 10You can supply different task environments to inject cameras into, or just test a sample scene.
 11Additionally, you can automatically find the maximum amount of cameras you can run a task with
 12through the auto-tune functionality.
 13
 14.. code-block:: bash
 15
 16    # Usage with GUI
 17    ./isaaclab.sh -p scripts/benchmarks/benchmark_cameras.py -h
 18
 19    # Usage with headless
 20    ./isaaclab.sh -p scripts/benchmarks/benchmark_cameras.py -h --headless
 21
 22"""
 23
 24"""Launch Isaac Sim Simulator first."""
 25
 26import argparse
 27from collections.abc import Callable
 28
 29from isaaclab.app import AppLauncher
 30
 31# parse the arguments
 32args_cli = argparse.Namespace()
 33
 34parser = argparse.ArgumentParser(description="This script can help you benchmark how many cameras you could run.")
 35
 36"""
 37The following arguments only need to be supplied for when one wishes
 38to try injecting cameras into their environment, and automatically determining
 39the maximum camera count.
 40"""
 41parser.add_argument(
 42    "--task",
 43    type=str,
 44    default=None,
 45    required=False,
 46    help="Supply this argument to spawn cameras within an known manager-based task environment.",
 47)
 48
 49parser.add_argument(
 50    "--autotune",
 51    default=False,
 52    action="store_true",
 53    help=(
 54        "Autotuning is only supported for provided task environments."
 55        " Supply this argument to increase the number of environments until a desired threshold is reached."
 56        "Install pynvml in your environment; ./isaaclab.sh -m pip install pynvml"
 57    ),
 58)
 59
 60parser.add_argument(
 61    "--task_num_cameras_per_env",
 62    type=int,
 63    default=1,
 64    help="The number of cameras per environment to use when using a known task.",
 65)
 66
 67parser.add_argument(
 68    "--use_fabric", action="store_true", default=False, help="Enable fabric and use USD I/O operations."
 69)
 70
 71parser.add_argument(
 72    "--autotune_max_percentage_util",
 73    nargs="+",
 74    type=float,
 75    default=[100.0, 80.0, 80.0, 80.0],
 76    required=False,
 77    help=(
 78        "The system utilization percentage thresholds to reach before an autotune is finished. "
 79        "If any one of these limits are hit, the autotune stops."
 80        "Thresholds are, in order, maximum CPU percentage utilization,"
 81        "maximum RAM percentage utilization, maximum GPU compute percent utilization, "
 82        "amd maximum GPU memory utilization."
 83    ),
 84)
 85
 86parser.add_argument(
 87    "--autotune_max_camera_count", type=int, default=4096, help="The maximum amount of cameras allowed in an autotune."
 88)
 89
 90parser.add_argument(
 91    "--autotune_camera_count_interval",
 92    type=int,
 93    default=25,
 94    help=(
 95        "The number of cameras to try to add to the environment if the current camera count"
 96        " falls within permitted system resource utilization limits."
 97    ),
 98)
 99
100"""
101The following arguments are shared for when injecting cameras into a task environment,
102as well as when creating cameras independent of a task environment.
103"""
104
105parser.add_argument(
106    "--num_tiled_cameras",
107    type=int,
108    default=0,
109    required=False,
110    help="Number of tiled cameras to create. For autotuning, this is how many cameras to start with.",
111)
112
113parser.add_argument(
114    "--num_standard_cameras",
115    type=int,
116    default=0,
117    required=False,
118    help="Number of standard cameras to create. For autotuning, this is how many cameras to start with.",
119)
120
121parser.add_argument(
122    "--num_ray_caster_cameras",
123    type=int,
124    default=0,
125    required=False,
126    help="Number of ray caster cameras to create. For autotuning, this is how many cameras to start with.",
127)
128
129parser.add_argument(
130    "--tiled_camera_data_types",
131    nargs="+",
132    type=str,
133    default=["rgb", "depth"],
134    help="The data types rendered by the tiled camera",
135)
136
137parser.add_argument(
138    "--standard_camera_data_types",
139    nargs="+",
140    type=str,
141    default=["rgb", "distance_to_image_plane", "distance_to_camera"],
142    help="The data types rendered by the standard camera",
143)
144
145parser.add_argument(
146    "--ray_caster_camera_data_types",
147    nargs="+",
148    type=str,
149    default=["distance_to_image_plane"],
150    help="The data types rendered by the ray caster camera.",
151)
152
153parser.add_argument(
154    "--ray_caster_visible_mesh_prim_paths",
155    nargs="+",
156    type=str,
157    default=["/World/ground"],
158    help="WARNING: Ray Caster can currently only cast against a single, static, object",
159)
160
161parser.add_argument(
162    "--convert_depth_to_camera_to_image_plane",
163    action="store_true",
164    default=True,
165    help=(
166        "Enable undistorting from perspective view (distance to camera data_type)"
167        "to orthogonal view (distance to plane data_type) for depth."
168        "This is currently needed to create undisorted depth images/point cloud."
169    ),
170)
171
172parser.add_argument(
173    "--keep_raw_depth",
174    dest="convert_depth_to_camera_to_image_plane",
175    action="store_false",
176    help=(
177        "Disable undistorting from perspective view (distance to camera)"
178        "to orthogonal view (distance to plane data_type) for depth."
179    ),
180)
181
182parser.add_argument(
183    "--height",
184    type=int,
185    default=120,
186    required=False,
187    help="Height in pixels of cameras",
188)
189
190parser.add_argument(
191    "--width",
192    type=int,
193    default=140,
194    required=False,
195    help="Width in pixels of cameras",
196)
197
198parser.add_argument(
199    "--warm_start_length",
200    type=int,
201    default=3,
202    required=False,
203    help=(
204        "Number of steps to run the sim before starting benchmark."
205        "Needed to avoid blank images at the start of the simulation."
206    ),
207)
208
209parser.add_argument(
210    "--experiment_length",
211    type=int,
212    default=15,
213    required=False,
214    help="Number of steps to average over",
215)
216
217# This argument is only used when a task is not provided.
218parser.add_argument(
219    "--num_objects",
220    type=int,
221    default=10,
222    required=False,
223    help="Number of objects to spawn into the scene when not using a known task.",
224)
225
226# Benchmark arguments
227parser.add_argument(
228    "--benchmark_backend",
229    type=str,
230    default="omniperf",
231    choices=["json", "osmo", "omniperf", "summary"],
232    help="Benchmarking backend options, defaults omniperf",
233)
234parser.add_argument("--output_path", type=str, default=".", help="Path to output benchmark results.")
235
236
237AppLauncher.add_app_launcher_args(parser)
238args_cli = parser.parse_args()
239args_cli.enable_cameras = True
240
241if args_cli.autotune:
242    import pynvml
243
244if len(args_cli.ray_caster_visible_mesh_prim_paths) > 1:
245    print("[WARNING]: Ray Casting is only currently supported for a single, static object")
246# launch omniverse app
247app_launcher = AppLauncher(args_cli)
248simulation_app = app_launcher.app
249
250"""Rest everything follows."""
251
252import random
253import time
254
255import gymnasium as gym
256import numpy as np
257import psutil
258import torch
259
260import isaaclab.sim as sim_utils
261from isaaclab.assets import RigidObject, RigidObjectCfg
262from isaaclab.scene.interactive_scene import InteractiveScene
263from isaaclab.sensors import (
264    Camera,
265    CameraCfg,
266    RayCasterCamera,
267    RayCasterCameraCfg,
268    patterns,
269)
270from isaaclab.test.benchmark import BaseIsaacLabBenchmark, DictMeasurement, SingleMeasurement
271from isaaclab.utils.math import orthogonalize_perspective_depth, unproject_depth
272
273from isaaclab_tasks.utils import load_cfg_from_registry
274
275"""
276Camera Creation
277"""
278
279
280def create_camera_base(
281    camera_cfg: type[CameraCfg],
282    num_cams: int,
283    data_types: list[str],
284    height: int,
285    width: int,
286    prim_path: str | None = None,
287    instantiate: bool = True,
288) -> Camera | CameraCfg | None:
289    """Generalized function to create a camera or tiled camera sensor."""
290    # Determine prim prefix based on the camera class
291    name = camera_cfg.class_type.__name__
292
293    if instantiate:
294        # Create the necessary prims
295        for idx in range(num_cams):
296            sim_utils.create_prim(f"/World/{name}_{idx:02d}", "Xform")
297    if prim_path is None:
298        prim_path = f"/World/{name}_.*/{name}"
299    # If valid camera settings are provided, create the camera
300    if num_cams > 0 and len(data_types) > 0 and height > 0 and width > 0:
301        cfg = camera_cfg(
302            prim_path=prim_path,
303            update_period=0,
304            height=height,
305            width=width,
306            data_types=data_types,
307            spawn=sim_utils.PinholeCameraCfg(
308                focal_length=24, focus_distance=400.0, horizontal_aperture=20.955, clipping_range=(0.1, 1e4)
309            ),
310        )
311        if instantiate:
312            return camera_cfg.class_type(cfg=cfg)
313        else:
314            return cfg
315    else:
316        return None
317
318
319def create_tiled_cameras(
320    num_cams: int = 2, data_types: list[str] | None = None, height: int = 100, width: int = 120
321) -> Camera | None:
322    if data_types is None:
323        data_types = ["rgb", "depth"]
324    """Defines the camera sensor to add to the scene."""
325    return create_camera_base(
326        camera_cfg=CameraCfg,
327        num_cams=num_cams,
328        data_types=data_types,
329        height=height,
330        width=width,
331    )
332
333
334def create_cameras(
335    num_cams: int = 2, data_types: list[str] | None = None, height: int = 100, width: int = 120
336) -> Camera | None:
337    """Defines the Standard cameras."""
338    if data_types is None:
339        data_types = ["rgb", "depth"]
340    return create_camera_base(
341        camera_cfg=CameraCfg, num_cams=num_cams, data_types=data_types, height=height, width=width
342    )
343
344
345def create_ray_caster_cameras(
346    num_cams: int = 2,
347    data_types: list[str] = ["distance_to_image_plane"],
348    mesh_prim_paths: list[str] = ["/World/ground"],
349    height: int = 100,
350    width: int = 120,
351    prim_path: str = "/World/RayCasterCamera_.*/RayCaster",
352    instantiate: bool = True,
353) -> RayCasterCamera | RayCasterCameraCfg | None:
354    """Create the raycaster cameras; different configuration than Standard/Tiled camera"""
355    for idx in range(num_cams):
356        sim_utils.create_prim(f"/World/RayCasterCamera_{idx:02d}/RayCaster", "Xform")
357
358    if num_cams > 0 and len(data_types) > 0 and height > 0 and width > 0:
359        cam_cfg = RayCasterCameraCfg(
360            prim_path=prim_path,
361            mesh_prim_paths=mesh_prim_paths,
362            update_period=0,
363            offset=RayCasterCameraCfg.OffsetCfg(pos=(0.0, 0.0, 0.0), rot=(1.0, 0.0, 0.0, 0.0)),
364            data_types=data_types,
365            debug_vis=False,
366            pattern_cfg=patterns.PinholeCameraPatternCfg(
367                focal_length=24.0,
368                horizontal_aperture=20.955,
369                height=480,
370                width=640,
371            ),
372        )
373        if instantiate:
374            return RayCasterCamera(cfg=cam_cfg)
375        else:
376            return cam_cfg
377
378    else:
379        return None
380
381
382def create_tiled_camera_cfg(prim_path: str) -> CameraCfg:
383    """Grab a simple camera config for injecting into task environments."""
384    return create_camera_base(
385        CameraCfg,
386        num_cams=args_cli.num_tiled_cameras,
387        data_types=args_cli.tiled_camera_data_types,
388        width=args_cli.width,
389        height=args_cli.height,
390        prim_path="{ENV_REGEX_NS}/" + prim_path,
391        instantiate=False,
392    )
393
394
395def create_standard_camera_cfg(prim_path: str) -> CameraCfg:
396    """Grab a simple standard camera config for injecting into task environments."""
397    return create_camera_base(
398        CameraCfg,
399        num_cams=args_cli.num_standard_cameras,
400        data_types=args_cli.standard_camera_data_types,
401        width=args_cli.width,
402        height=args_cli.height,
403        prim_path="{ENV_REGEX_NS}/" + prim_path,
404        instantiate=False,
405    )
406
407
408def create_ray_caster_camera_cfg(prim_path: str) -> RayCasterCameraCfg:
409    """Grab a simple ray caster config for injecting into task environments."""
410    return create_ray_caster_cameras(
411        num_cams=args_cli.num_ray_caster_cameras,
412        data_types=args_cli.ray_caster_camera_data_types,
413        width=args_cli.width,
414        height=args_cli.height,
415        prim_path="{ENV_REGEX_NS}/" + prim_path,
416    )
417
418
419"""
420Scene Creation
421"""
422
423
424def design_scene(
425    num_tiled_cams: int = 2,
426    num_standard_cams: int = 0,
427    num_ray_caster_cams: int = 0,
428    tiled_camera_data_types: list[str] | None = None,
429    standard_camera_data_types: list[str] | None = None,
430    ray_caster_camera_data_types: list[str] | None = None,
431    height: int = 100,
432    width: int = 200,
433    num_objects: int = 20,
434    mesh_prim_paths: list[str] = ["/World/ground"],
435) -> dict:
436    """Design the scene."""
437    if tiled_camera_data_types is None:
438        tiled_camera_data_types = ["rgb"]
439    if standard_camera_data_types is None:
440        standard_camera_data_types = ["rgb"]
441    if ray_caster_camera_data_types is None:
442        ray_caster_camera_data_types = ["distance_to_image_plane"]
443
444    # Populate scene
445    # -- Ground-plane
446    cfg = sim_utils.GroundPlaneCfg()
447    cfg.func("/World/ground", cfg)
448    # -- Lights
449    cfg = sim_utils.DistantLightCfg(intensity=3000.0, color=(0.75, 0.75, 0.75))
450    cfg.func("/World/Light", cfg)
451
452    # Create a dictionary for the scene entities
453    scene_entities = {}
454
455    # Xform to hold objects
456    sim_utils.create_prim("/World/Objects", "Xform")
457    # Random objects
458    for i in range(num_objects):
459        # sample random position
460        position = np.random.rand(3) - np.asarray([0.05, 0.05, -1.0])
461        position *= np.asarray([1.5, 1.5, 0.5])
462        # sample random color
463        color = (random.random(), random.random(), random.random())
464        # choose random prim type
465        prim_type = random.choice(["Cube", "Cone", "Cylinder"])
466        common_properties = {
467            "rigid_props": sim_utils.RigidBodyPropertiesCfg(),
468            "mass_props": sim_utils.MassPropertiesCfg(mass=5.0),
469            "collision_props": sim_utils.CollisionPropertiesCfg(),
470            "visual_material": sim_utils.PreviewSurfaceCfg(diffuse_color=color, metallic=0.5),
471            "semantic_tags": [("class", prim_type)],
472        }
473        if prim_type == "Cube":
474            shape_cfg = sim_utils.CuboidCfg(size=(0.25, 0.25, 0.25), **common_properties)
475        elif prim_type == "Cone":
476            shape_cfg = sim_utils.ConeCfg(radius=0.1, height=0.25, **common_properties)
477        elif prim_type == "Cylinder":
478            shape_cfg = sim_utils.CylinderCfg(radius=0.25, height=0.25, **common_properties)
479        # Rigid Object
480        obj_cfg = RigidObjectCfg(
481            prim_path=f"/World/Objects/Obj_{i:02d}",
482            spawn=shape_cfg,
483            init_state=RigidObjectCfg.InitialStateCfg(pos=position),
484        )
485        scene_entities[f"rigid_object{i}"] = RigidObject(cfg=obj_cfg)
486
487    # Sensors
488    standard_camera = create_cameras(
489        num_cams=num_standard_cams, data_types=standard_camera_data_types, height=height, width=width
490    )
491    tiled_camera = create_tiled_cameras(
492        num_cams=num_tiled_cams, data_types=tiled_camera_data_types, height=height, width=width
493    )
494    ray_caster_camera = create_ray_caster_cameras(
495        num_cams=num_ray_caster_cams,
496        data_types=ray_caster_camera_data_types,
497        mesh_prim_paths=mesh_prim_paths,
498        height=height,
499        width=width,
500    )
501    # return the scene information
502    if tiled_camera is not None:
503        scene_entities["tiled_camera"] = tiled_camera
504    if standard_camera is not None:
505        scene_entities["standard_camera"] = standard_camera
506    if ray_caster_camera is not None:
507        scene_entities["ray_caster_camera"] = ray_caster_camera
508    return scene_entities
509
510
511def inject_cameras_into_task(
512    task: str,
513    num_cams: int,
514    camera_name_prefix: str,
515    camera_creation_callable: Callable,
516    num_cameras_per_env: int = 1,
517) -> gym.Env:
518    """Loads the task, sticks cameras into the config, and creates the environment."""
519    cfg = load_cfg_from_registry(task, "env_cfg_entry_point")
520    cfg.sim.device = args_cli.device
521    cfg.sim.use_fabric = args_cli.use_fabric
522    scene_cfg = cfg.scene
523
524    num_envs = int(num_cams / num_cameras_per_env)
525    scene_cfg.num_envs = num_envs
526
527    for idx in range(num_cameras_per_env):
528        suffix = "" if idx == 0 else str(idx)
529        name = camera_name_prefix + suffix
530        setattr(scene_cfg, name, camera_creation_callable(name))
531    cfg.scene = scene_cfg
532    env = gym.make(task, cfg=cfg)
533    return env
534
535
536"""
537System diagnosis
538"""
539
540
541def get_utilization_percentages(reset: bool = False, max_values: list[float] = [0.0, 0.0, 0.0, 0.0]) -> list[float]:
542    """Get the maximum CPU, RAM, GPU utilization (processing), and
543    GPU memory usage percentages since the last time reset was true."""
544    if reset:
545        max_values[:] = [0, 0, 0, 0]  # Reset the max values
546
547    # CPU utilization
548    cpu_usage = psutil.cpu_percent(interval=0.1)
549    max_values[0] = max(max_values[0], cpu_usage)
550
551    # RAM utilization
552    memory_info = psutil.virtual_memory()
553    ram_usage = memory_info.percent
554    max_values[1] = max(max_values[1], ram_usage)
555
556    # GPU utilization using pynvml
557    if torch.cuda.is_available():
558        if args_cli.autotune:
559            pynvml.nvmlInit()  # Initialize NVML
560            for i in range(torch.cuda.device_count()):
561                handle = pynvml.nvmlDeviceGetHandleByIndex(i)
562
563                # GPU Utilization
564                gpu_utilization = pynvml.nvmlDeviceGetUtilizationRates(handle)
565                gpu_processing_utilization_percent = gpu_utilization.gpu  # GPU core utilization
566                max_values[2] = max(max_values[2], gpu_processing_utilization_percent)
567
568                # GPU Memory Usage
569                memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
570                gpu_memory_total = memory_info.total
571                gpu_memory_used = memory_info.used
572                gpu_memory_utilization_percent = (gpu_memory_used / gpu_memory_total) * 100
573                max_values[3] = max(max_values[3], gpu_memory_utilization_percent)
574
575            pynvml.nvmlShutdown()  # Shutdown NVML after usage
576    else:
577        gpu_processing_utilization_percent = None
578        gpu_memory_utilization_percent = None
579    return max_values
580
581
582"""
583Experiment
584"""
585
586
587def run_simulator(
588    sim: sim_utils.SimulationContext | None,
589    scene_entities: dict | InteractiveScene,
590    warm_start_length: int = 10,
591    experiment_length: int = 100,
592    tiled_camera_data_types: list[str] | None = None,
593    standard_camera_data_types: list[str] | None = None,
594    ray_caster_camera_data_types: list[str] | None = None,
595    depth_predicate: Callable = lambda x: "to" in x or x == "depth",
596    perspective_depth_predicate: Callable = lambda x: x == "distance_to_camera",
597    convert_depth_to_camera_to_image_plane: bool = True,
598    max_cameras_per_env: int = 1,
599    env: gym.Env | None = None,
600) -> dict:
601    """Run the simulator with all cameras, and return timing analytics. Visualize if desired."""
602
603    if tiled_camera_data_types is None:
604        tiled_camera_data_types = ["rgb"]
605    if standard_camera_data_types is None:
606        standard_camera_data_types = ["rgb"]
607    if ray_caster_camera_data_types is None:
608        ray_caster_camera_data_types = ["distance_to_image_plane"]
609
610    # Initialize camera lists
611    tiled_cameras = []
612    standard_cameras = []
613    ray_caster_cameras = []
614
615    # Dynamically extract cameras from the scene entities up to max_cameras_per_env
616    for i in range(max_cameras_per_env):
617        # Extract tiled cameras
618        tiled_camera_key = f"tiled_camera{i}" if i > 0 else "tiled_camera"
619        standard_camera_key = f"standard_camera{i}" if i > 0 else "standard_camera"
620        ray_caster_camera_key = f"ray_caster_camera{i}" if i > 0 else "ray_caster_camera"
621
622        try:  # if instead you checked ... if key is in scene_entities... # errors out always even if key present
623            tiled_cameras.append(scene_entities[tiled_camera_key])
624            standard_cameras.append(scene_entities[standard_camera_key])
625            ray_caster_cameras.append(scene_entities[ray_caster_camera_key])
626        except KeyError:
627            break
628
629    # Initialize camera counts
630    camera_lists = [tiled_cameras, standard_cameras, ray_caster_cameras]
631    camera_data_types = [tiled_camera_data_types, standard_camera_data_types, ray_caster_camera_data_types]
632    labels = ["tiled", "standard", "ray_caster"]
633
634    if sim is not None:
635        # Set camera world poses
636        for camera_list in camera_lists:
637            for camera in camera_list:
638                num_cameras = camera.data.intrinsic_matrices.size(0)
639                positions = torch.tensor([[2.5, 2.5, 2.5]], device=sim.device).repeat(num_cameras, 1)
640                targets = torch.tensor([[0.0, 0.0, 0.0]], device=sim.device).repeat(num_cameras, 1)
641                camera.set_world_poses_from_view(positions, targets)
642
643    # Initialize timing variables
644    timestep = 0
645    total_time = 0.0
646    valid_timesteps = 0
647    sim_step_time = 0.0
648
649    while simulation_app.is_running() and timestep < experiment_length:
650        print(f"On timestep {timestep} of {experiment_length}, with warm start of {warm_start_length}")
651        get_utilization_percentages()
652
653        # Measure the total simulation step time
654        step_start_time = time.time()
655
656        if sim is not None:
657            sim.step()
658
659        if env is not None:
660            with torch.inference_mode():
661                # compute zero actions
662                actions = torch.zeros(env.action_space.shape, device=env.unwrapped.device)
663                # apply actions
664                env.step(actions)
665
666        # Update cameras and process vision data within the simulation step
667        clouds = {}
668        images = {}
669        depth_images = {}
670
671        # Loop through all camera lists and their data_types
672        for camera_list, data_types, label in zip(camera_lists, camera_data_types, labels):
673            for cam_idx, camera in enumerate(camera_list):
674                if env is None:  # No env, need to step cams manually
675                    # Only update the camera if it hasn't been updated as part of scene_entities.update ...
676                    camera.update(dt=sim.get_physics_dt())
677
678                for data_type in data_types:
679                    data_label = f"{label}_{cam_idx}_{data_type}"
680
681                    if depth_predicate(data_type):  # is a depth image, want to create cloud
682                        depth = camera.data.output[data_type]
683                        depth_images[data_label + "_raw"] = depth
684                        if perspective_depth_predicate(data_type) and convert_depth_to_camera_to_image_plane:
685                            depth = orthogonalize_perspective_depth(
686                                camera.data.output[data_type], camera.data.intrinsic_matrices
687                            )
688                            depth_images[data_label + "_undistorted"] = depth
689
690                        pointcloud = unproject_depth(depth=depth, intrinsics=camera.data.intrinsic_matrices)
691                        clouds[data_label] = pointcloud
692                    else:  # rgb image, just save it
693                        image = camera.data.output[data_type]
694                        images[data_label] = image
695
696        # End timing for the step
697        step_end_time = time.time()
698        sim_step_time += step_end_time - step_start_time
699
700        if timestep > warm_start_length:
701            get_utilization_percentages(reset=True)
702            total_time += step_end_time - step_start_time
703            valid_timesteps += 1
704
705        timestep += 1
706
707    # Calculate average timings
708    if valid_timesteps > 0:
709        avg_timestep_duration = total_time / valid_timesteps
710        avg_sim_step_duration = sim_step_time / experiment_length
711    else:
712        avg_timestep_duration = 0.0
713        avg_sim_step_duration = 0.0
714
715    # Package timing analytics in a dictionary
716    timing_analytics = {
717        "average_timestep_duration": avg_timestep_duration,
718        "average_sim_step_duration": avg_sim_step_duration,
719        "total_simulation_time": sim_step_time,
720        "total_experiment_duration": sim_step_time,
721    }
722
723    system_utilization_analytics = get_utilization_percentages()
724
725    print("--- Benchmark Results ---")
726    print(f"Average timestep duration: {avg_timestep_duration:.6f} seconds")
727    print(f"Average simulation step duration: {avg_sim_step_duration:.6f} seconds")
728    print(f"Total simulation time: {sim_step_time:.6f} seconds")
729    print("\nSystem Utilization Statistics:")
730    print(
731        f"| CPU:{system_utilization_analytics[0]}% | "
732        f"RAM:{system_utilization_analytics[1]}% | "
733        f"GPU Compute:{system_utilization_analytics[2]}% | "
734        f" GPU Memory: {system_utilization_analytics[3]:.2f}% |"
735    )
736
737    return {"timing_analytics": timing_analytics, "system_utilization_analytics": system_utilization_analytics}
738
739
740def main():
741    """Main function."""
742    # Load simulation context
743    if args_cli.num_tiled_cameras + args_cli.num_standard_cameras + args_cli.num_ray_caster_cameras <= 0:
744        raise ValueError("You must select at least one camera.")
745    if (
746        (args_cli.num_tiled_cameras > 0 and args_cli.num_standard_cameras > 0)
747        or (args_cli.num_ray_caster_cameras > 0 and args_cli.num_standard_cameras > 0)
748        or (args_cli.num_ray_caster_cameras > 0 and args_cli.num_tiled_cameras > 0)
749    ):
750        print("[WARNING]: You have elected to use more than one camera type.")
751        print("[WARNING]: For a benchmark to be meaningful, use ONLY ONE camera type at a time.")
752        print(
753            "[WARNING]: For example, if num_tiled_cameras=100, for a meaningful benchmark,"
754            "num_standard_cameras should be 0, and num_ray_caster_cameras should be 0"
755        )
756        raise ValueError("Benchmark one camera at a time.")
757
758    # Determine which camera type is being used
759    camera_type = "tiled"
760    num_cameras = args_cli.num_tiled_cameras
761    if args_cli.num_standard_cameras > 0:
762        camera_type = "standard"
763        num_cameras = args_cli.num_standard_cameras
764    elif args_cli.num_ray_caster_cameras > 0:
765        camera_type = "ray_caster"
766        num_cameras = args_cli.num_ray_caster_cameras
767
768    # Create the benchmark
769    backend_type = args_cli.benchmark_backend
770    benchmark = BaseIsaacLabBenchmark(
771        benchmark_name="benchmark_cameras",
772        backend_type=backend_type,
773        output_path=args_cli.output_path,
774        use_recorders=True,
775        frametime_recorders=backend_type in ("summary", "omniperf"),
776        output_prefix="benchmark_cameras",
777        workflow_metadata={
778            "metadata": [
779                {"name": "task", "data": args_cli.task},
780                {"name": "camera_type", "data": camera_type},
781                {"name": "num_cameras", "data": num_cameras},
782                {"name": "height", "data": args_cli.height},
783                {"name": "width", "data": args_cli.width},
784                {"name": "experiment_length", "data": args_cli.experiment_length},
785                {"name": "autotune", "data": args_cli.autotune},
786            ]
787        },
788    )
789
790    print("[INFO]: Designing the scene")
791    final_analysis = None
792
793    if args_cli.task is None:
794        print("[INFO]: No task environment provided, creating random scene.")
795        sim_cfg = sim_utils.SimulationCfg(device=args_cli.device)
796        sim = sim_utils.SimulationContext(sim_cfg)
797        # Set main camera
798        sim.set_camera_view([2.5, 2.5, 2.5], [0.0, 0.0, 0.0])
799        scene_entities = design_scene(
800            num_tiled_cams=args_cli.num_tiled_cameras,
801            num_standard_cams=args_cli.num_standard_cameras,
802            num_ray_caster_cams=args_cli.num_ray_caster_cameras,
803            tiled_camera_data_types=args_cli.tiled_camera_data_types,
804            standard_camera_data_types=args_cli.standard_camera_data_types,
805            ray_caster_camera_data_types=args_cli.ray_caster_camera_data_types,
806            height=args_cli.height,
807            width=args_cli.width,
808            num_objects=args_cli.num_objects,
809            mesh_prim_paths=args_cli.ray_caster_visible_mesh_prim_paths,
810        )
811        # Play simulator
812        sim.reset()
813        # Now we are ready!
814        print("[INFO]: Setup complete...")
815        # Run simulator
816        final_analysis = run_simulator(
817            sim=sim,
818            scene_entities=scene_entities,
819            warm_start_length=args_cli.warm_start_length,
820            experiment_length=args_cli.experiment_length,
821            tiled_camera_data_types=args_cli.tiled_camera_data_types,
822            standard_camera_data_types=args_cli.standard_camera_data_types,
823            ray_caster_camera_data_types=args_cli.ray_caster_camera_data_types,
824            convert_depth_to_camera_to_image_plane=args_cli.convert_depth_to_camera_to_image_plane,
825        )
826    else:
827        print("[INFO]: Using known task environment, injecting cameras.")
828        autotune_iter = 0
829        max_sys_util_thresh = [0.0, 0.0, 0.0]
830        max_num_cams = max(args_cli.num_tiled_cameras, args_cli.num_standard_cameras, args_cli.num_ray_caster_cameras)
831        cur_num_cams = max_num_cams
832        cur_sys_util = max_sys_util_thresh
833        interval = args_cli.autotune_camera_count_interval
834
835        if args_cli.autotune:
836            max_sys_util_thresh = args_cli.autotune_max_percentage_util
837            max_num_cams = args_cli.autotune_max_camera_count
838            print("[INFO]: Auto tuning until any of the following threshold are met")
839            print(f"|CPU: {max_sys_util_thresh[0]}% | RAM {max_sys_util_thresh[1]}% | GPU: {max_sys_util_thresh[2]}% |")
840            print(f"[INFO]: Maximum number of cameras allowed: {max_num_cams}")
841        # Determine which camera is being tested...
842        tiled_camera_cfg = create_tiled_camera_cfg("tiled_camera")
843        standard_camera_cfg = create_standard_camera_cfg("standard_camera")
844        ray_caster_camera_cfg = create_ray_caster_camera_cfg("ray_caster_camera")
845        camera_name_prefix = ""
846        camera_creation_callable = None
847        num_cams = 0
848        if tiled_camera_cfg is not None:
849            camera_name_prefix = "tiled_camera"
850            camera_creation_callable = create_tiled_camera_cfg
851            num_cams = args_cli.num_tiled_cameras
852        elif standard_camera_cfg is not None:
853            camera_name_prefix = "standard_camera"
854            camera_creation_callable = create_standard_camera_cfg
855            num_cams = args_cli.num_standard_cameras
856        elif ray_caster_camera_cfg is not None:
857            camera_name_prefix = "ray_caster_camera"
858            camera_creation_callable = create_ray_caster_camera_cfg
859            num_cams = args_cli.num_ray_caster_cameras
860
861        while (
862            all(cur <= max_thresh for cur, max_thresh in zip(cur_sys_util, max_sys_util_thresh))
863            and cur_num_cams <= max_num_cams
864        ):
865            cur_num_cams = num_cams + interval * autotune_iter
866            autotune_iter += 1
867
868            env = inject_cameras_into_task(
869                task=args_cli.task,
870                num_cams=cur_num_cams,
871                camera_name_prefix=camera_name_prefix,
872                camera_creation_callable=camera_creation_callable,
873                num_cameras_per_env=args_cli.task_num_cameras_per_env,
874            )
875            env.reset()
876            print(f"Testing with {cur_num_cams} {camera_name_prefix}")
877            analysis = run_simulator(
878                sim=None,
879                scene_entities=env.unwrapped.scene,
880                warm_start_length=args_cli.warm_start_length,
881                experiment_length=args_cli.experiment_length,
882                tiled_camera_data_types=args_cli.tiled_camera_data_types,
883                standard_camera_data_types=args_cli.standard_camera_data_types,
884                ray_caster_camera_data_types=args_cli.ray_caster_camera_data_types,
885                convert_depth_to_camera_to_image_plane=args_cli.convert_depth_to_camera_to_image_plane,
886                max_cameras_per_env=args_cli.task_num_cameras_per_env,
887                env=env,
888            )
889
890            cur_sys_util = analysis["system_utilization_analytics"]
891            final_analysis = analysis
892            print("Triggering reset...")
893            env.close()
894            sim_utils.create_new_stage()
895        print("[INFO]: DONE! Feel free to CTRL + C Me ")
896        print(f"[INFO]: If you've made it this far, you can likely simulate {cur_num_cams} {camera_name_prefix}")
897        print("Keep in mind, this is without any training running on the GPU.")
898        print("Set lower utilization thresholds to account for training.")
899
900        if not args_cli.autotune:
901            print("[WARNING]: GPU Util Statistics only correct while autotuning, ignore above.")
902
903    # Log benchmark measurements
904    if final_analysis is not None:
905        timing = final_analysis["timing_analytics"]
906        sys_util = final_analysis["system_utilization_analytics"]
907
908        # Log timing measurements
909        benchmark.add_measurement(
910            "runtime",
911            measurement=SingleMeasurement(
912                name="Average Timestep Duration", value=timing["average_timestep_duration"] * 1000, unit="ms"
913            ),
914        )
915        benchmark.add_measurement(
916            "runtime",
917            measurement=SingleMeasurement(
918                name="Average Simulation Step Duration", value=timing["average_sim_step_duration"] * 1000, unit="ms"
919            ),
920        )
921        benchmark.add_measurement(
922            "runtime",
923            measurement=SingleMeasurement(
924                name="Total Simulation Time", value=timing["total_simulation_time"] * 1000, unit="ms"
925            ),
926        )
927
928        # Log system utilization
929        benchmark.add_measurement(
930            "runtime",
931            measurement=DictMeasurement(
932                name="System Utilization",
933                value={
934                    "cpu_percent": sys_util[0],
935                    "ram_percent": sys_util[1],
936                    "gpu_compute_percent": sys_util[2],
937                    "gpu_memory_percent": sys_util[3],
938                },
939            ),
940        )
941
942    # Finalize benchmark
943    benchmark.update_manual_recorders()
944    benchmark._finalize_impl()
945
946
947if __name__ == "__main__":
948    # run the main function
949    main()
950    # close sim app
951    simulation_app.close()

Possible Parameters#

First, run

./isaaclab.sh -p scripts/benchmarks/benchmark_cameras.py -h

to see all possible parameters you can vary with this utility.

See the command line parameters related to autotune for more information about automatically determining maximum camera count.

Compare Performance in Task Environments and Automatically Determine Task Max Camera Count#

Currently, tiled cameras are the most performant camera that can handle multiple dynamic objects.

For example, to see how your system could handle 100 tiled cameras in the cartpole environment, with 2 cameras per environment (so 50 environments total) only in RGB mode, run

./isaaclab.sh -p scripts/benchmarks/benchmark_cameras.py \
--task Isaac-Cartpole-v0 --num_tiled_cameras 100 \
--task_num_cameras_per_env 2 \
--tiled_camera_data_types rgb

If you have pynvml installed, (./isaaclab.sh -p -m pip install pynvml), you can also find the maximum number of cameras that you could run in the specified environment up to a certain performance threshold (specified by max CPU utilization percent, max RAM utilization percent, max GPU compute percent, and max GPU memory percent). For example, to find the maximum number of cameras you can run with cartpole, you could run:

./isaaclab.sh -p scripts/benchmarks/benchmark_cameras.py \
--task Isaac-Cartpole-v0 --num_tiled_cameras 100 \
--task_num_cameras_per_env 2 \
--tiled_camera_data_types rgb --autotune \
--autotune_max_percentage_util 100 80 50 50

Autotune may lead to the program crashing, which means that it tried to run too many cameras at once. However, the max percentage utilization parameter is meant to prevent this from happening.

The output of the benchmark doesn’t include the overhead of training the network, so consider decreasing the maximum utilization percentages to account for this overhead. The final output camera count is for all cameras, so to get the total number of environments, divide the output camera count by the number of cameras per environment.

Compare Camera Type and Performance (Without a Specified Task)#

This tool can also asses performance without a task environment. For example, to view 100 random objects with 2 standard cameras, one could run

./isaaclab.sh -p scripts/benchmarks/benchmark_cameras.py \
--height 100 --width 100 --num_standard_cameras 2 \
--standard_camera_data_types instance_segmentation_fast normals --num_objects 100 \
--experiment_length 100

If your system cannot handle this due to performance reasons, then the process will be killed. It’s recommended to monitor CPU/RAM utilization and GPU utilization while running this script, to get an idea of how many resources rendering the desired camera requires. In Ubuntu, you can use tools like htop and nvtop to live monitor resources while running this script, and in Windows, you can use the Task Manager.

If your system has a hard time handling the desired cameras, you can try the following

  • Switch to headless mode (supply --headless)

  • Ensure you are using the GPU pipeline not CPU!

  • If you aren’t using Tiled Cameras, switch to Tiled Cameras

  • Decrease camera resolution

  • Decrease how many data_types there are for each camera.

  • Decrease the number of cameras

  • Decrease the number of objects in the scene

If your system is able to handle the amount of cameras, then the time statistics will be printed to the terminal. After the simulations stops it can be closed with CTRL+C.