Deploying CloudXR Teleoperation on Kubernetes

Deploying CloudXR Teleoperation on Kubernetes#

This section explains how to deploy CloudXR Teleoperation for Isaac Lab on a Kubernetes (K8s) cluster.

System Requirements#

Minimum requirement: Kubernetes cluster with a node that has at least 1 NVIDIA RTX PRO 6000 / L40 GPU or equivalent
Recommended requirement: Kubernetes cluster with a node that has at least 2 RTX PRO 6000 / L40 GPUs or equivalent

Software Dependencies#

kubectl on your host computer
- If you use MicroK8s, you already have microk8s kubectl
- Otherwise follow the official kubectl installation guide
helm on your host computer
- If you use MicroK8s, you already have microk8s helm
- Otherwise follow the official Helm installation guide
Access to NGC public registry from your Kubernetes cluster, in particular these container images:
- https://catalog.ngc.nvidia.com/orgs/nvidia/containers/isaac-lab
- https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cloudxr-runtime
NVIDIA GPU Operator or equivalent installed in your Kubernetes cluster to expose NVIDIA GPUs
NVIDIA Container Toolkit installed on the nodes of your Kubernetes cluster

Preparation#

On your host computer, you should have already configured kubectl to access your Kubernetes cluster. To validate, run the following command and verify it returns your nodes correctly:

kubectl get node

If you are installing this to your own Kubernetes cluster instead of using the setup described in the Appendix: Setting Up a Local K8s Cluster with MicroK8s, your role in the K8s cluster should have at least the following RBAC permissions:

rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["apps"]
  resources: ["deployments", "replicasets"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
  resources: ["services"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

Installation#

Note

The following steps are verified on a MicroK8s cluster with GPU Operator installed (see configurations in the Appendix: Setting Up a Local K8s Cluster with MicroK8s). You can configure your own K8s cluster accordingly if you encounter issues.

Download the Helm chart from NGC (get your NGC API key based on the public guide):

helm fetch https://helm.ngc.nvidia.com/nvidia/charts/isaac-lab-teleop-2.2.0.tgz \
  --username='$oauthtoken' \
  --password=<your-ngc-api-key>

Install and run the CloudXR Teleoperation for Isaac Lab pod in the default namespace, consuming all host GPUs:

helm upgrade --install hello-isaac-teleop isaac-lab-teleop-2.2.0.tgz \
  --set fullnameOverride=hello-isaac-teleop \
  --set hostNetwork="true"

Note

You can remove the need for host network by creating an external LoadBalancer VIP (e.g., with MetalLB), and setting the environment variable NV_CXR_ENDPOINT_IP when deploying the Helm chart:

# local_values.yml file example:
fullnameOverride: hello-isaac-teleop
streamer:
  extraEnvs:
    - name: NV_CXR_ENDPOINT_IP
      value: "<your external LoadBalancer VIP>"
    - name: ACCEPT_EULA
      value: "Y"

# command
helm upgrade --install --values local_values.yml \
  hello-isaac-teleop isaac-lab-teleop-2.2.0.tgz

Verify the deployment is completed:
```
kubectl wait --for=condition=available --timeout=300s \
  deployment/hello-isaac-teleop
```
After the pod is running, it might take approximately 5-8 minutes to complete loading assets and start streaming.

Uninstallation#

You can uninstall by simply running:

helm uninstall hello-isaac-teleop

Appendix: Setting Up a Local K8s Cluster with MicroK8s#

Your local workstation should have the NVIDIA Container Toolkit and its dependencies installed. Otherwise, the following setup will not work.

Cleaning Up Existing Installations (Optional)#

# Clean up the system to ensure we start fresh
sudo snap remove microk8s
sudo snap remove helm
sudo apt-get remove docker-ce docker-ce-cli containerd.io
# If you have snap docker installed, remove it as well
sudo snap remove docker

Installing MicroK8s#

sudo snap install microk8s --classic

Installing NVIDIA GPU Operator#

microk8s helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
microk8s helm repo update
microk8s helm install gpu-operator \
  -n gpu-operator \
  --create-namespace nvidia/gpu-operator \
  --set toolkit.env[0].name=CONTAINERD_CONFIG \
  --set toolkit.env[0].value=/var/snap/microk8s/current/args/containerd-template.toml \
  --set toolkit.env[1].name=CONTAINERD_SOCKET \
  --set toolkit.env[1].value=/var/snap/microk8s/common/run/containerd.sock \
  --set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \
  --set toolkit.env[2].value=nvidia \
  --set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT \
  --set-string toolkit.env[3].value=true

Note

If you have configured the GPU operator to use volume mounts for DEVICE_LIST_STRATEGY on the device plugin and disabled ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED on the toolkit, this configuration is currently unsupported, as there is no method to ensure the assigned GPU resource is consistently shared between containers of the same pod.

Verifying Installation#

Run the following command to verify that all pods are running correctly:

microk8s kubectl get pods -n gpu-operator

You should see output similar to:

NAMESPACE          NAME                                                        READY   STATUS      RESTARTS   AGE
gpu-operator       gpu-operator-node-feature-discovery-gc-76dc6664b8-npkdg       1/1     Running     0          77m
gpu-operator       gpu-operator-node-feature-discovery-master-7d6b448f6d-76fqj   1/1     Running     0          77m
gpu-operator       gpu-operator-node-feature-discovery-worker-8wr4n              1/1     Running     0          77m
gpu-operator       gpu-operator-86656466d6-wjqf4                                 1/1     Running     0          77m
gpu-operator       nvidia-container-toolkit-daemonset-qffh6                      1/1     Running     0          77m
gpu-operator       nvidia-dcgm-exporter-vcxsf                                    1/1     Running     0          77m
gpu-operator       nvidia-cuda-validator-x9qn4                                   0/1     Completed   0          76m
gpu-operator       nvidia-device-plugin-daemonset-t4j4k                          1/1     Running     0          77m
gpu-operator       gpu-feature-discovery-8dms9                                   1/1     Running     0          77m
gpu-operator       nvidia-operator-validator-gjs9m                               1/1     Running     0          77m

Once all pods are running, you can proceed to the Installation section.

Deploying CloudXR Teleoperation on Kubernetes

Contents

Deploying CloudXR Teleoperation on Kubernetes#

System Requirements#

Software Dependencies#

Preparation#

Installation#

Uninstallation#

Appendix: Setting Up a Local K8s Cluster with MicroK8s#

Cleaning Up Existing Installations (Optional)#

Installing MicroK8s#

Installing NVIDIA GPU Operator#

Verifying Installation#