Deploying CloudXR Teleoperation on Kubernetes#
This section explains how to deploy CloudXR Teleoperation for Isaac Lab on a Kubernetes (K8s) cluster.
System Requirements#
Minimum requirement: Kubernetes cluster with a node that has at least 1 NVIDIA RTX PRO 6000 / L40 GPU or equivalent
Recommended requirement: Kubernetes cluster with a node that has at least 2 RTX PRO 6000 / L40 GPUs or equivalent
Software Dependencies#
kubectl
on your host computerIf you use MicroK8s, you already have
microk8s kubectl
Otherwise follow the official kubectl installation guide
helm
on your host computerIf you use MicroK8s, you already have
microk8s helm
Otherwise follow the official Helm installation guide
Access to NGC public registry from your Kubernetes cluster, in particular these container images:
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/isaac-lab
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cloudxr-runtime
NVIDIA GPU Operator or equivalent installed in your Kubernetes cluster to expose NVIDIA GPUs
NVIDIA Container Toolkit installed on the nodes of your Kubernetes cluster
Preparation#
On your host computer, you should have already configured kubectl
to access your Kubernetes cluster. To validate, run the following command and verify it returns your nodes correctly:
kubectl get node
If you are installing this to your own Kubernetes cluster instead of using the setup described in the Appendix: Setting Up a Local K8s Cluster with MicroK8s, your role in the K8s cluster should have at least the following RBAC permissions:
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["apps"]
resources: ["deployments", "replicasets"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
resources: ["services"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
Installation#
Note
The following steps are verified on a MicroK8s cluster with GPU Operator installed (see configurations in the Appendix: Setting Up a Local K8s Cluster with MicroK8s). You can configure your own K8s cluster accordingly if you encounter issues.
Download the Helm chart from NGC (get your NGC API key based on the public guide):
helm fetch https://helm.ngc.nvidia.com/nvidia/charts/isaac-lab-teleop-2.2.0.tgz \ --username='$oauthtoken' \ --password=<your-ngc-api-key>
Install and run the CloudXR Teleoperation for Isaac Lab pod in the default namespace, consuming all host GPUs:
helm upgrade --install hello-isaac-teleop isaac-lab-teleop-2.2.0.tgz \ --set fullnameOverride=hello-isaac-teleop \ --set hostNetwork="true"
Note
You can remove the need for host network by creating an external LoadBalancer VIP (e.g., with MetalLB), and setting the environment variable
NV_CXR_ENDPOINT_IP
when deploying the Helm chart:# local_values.yml file example: fullnameOverride: hello-isaac-teleop streamer: extraEnvs: - name: NV_CXR_ENDPOINT_IP value: "<your external LoadBalancer VIP>" - name: ACCEPT_EULA value: "Y"
# command helm upgrade --install --values local_values.yml \ hello-isaac-teleop isaac-lab-teleop-2.2.0.tgz
Verify the deployment is completed:
kubectl wait --for=condition=available --timeout=300s \ deployment/hello-isaac-teleop
After the pod is running, it might take approximately 5-8 minutes to complete loading assets and start streaming.
Uninstallation#
You can uninstall by simply running:
helm uninstall hello-isaac-teleop
Appendix: Setting Up a Local K8s Cluster with MicroK8s#
Your local workstation should have the NVIDIA Container Toolkit and its dependencies installed. Otherwise, the following setup will not work.
Cleaning Up Existing Installations (Optional)#
# Clean up the system to ensure we start fresh
sudo snap remove microk8s
sudo snap remove helm
sudo apt-get remove docker-ce docker-ce-cli containerd.io
# If you have snap docker installed, remove it as well
sudo snap remove docker
Installing MicroK8s#
sudo snap install microk8s --classic
Installing NVIDIA GPU Operator#
microk8s helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
microk8s helm repo update
microk8s helm install gpu-operator \
-n gpu-operator \
--create-namespace nvidia/gpu-operator \
--set toolkit.env[0].name=CONTAINERD_CONFIG \
--set toolkit.env[0].value=/var/snap/microk8s/current/args/containerd-template.toml \
--set toolkit.env[1].name=CONTAINERD_SOCKET \
--set toolkit.env[1].value=/var/snap/microk8s/common/run/containerd.sock \
--set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \
--set toolkit.env[2].value=nvidia \
--set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT \
--set-string toolkit.env[3].value=true
Note
If you have configured the GPU operator to use volume mounts for DEVICE_LIST_STRATEGY
on the device plugin and disabled ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED
on the toolkit, this configuration is currently unsupported, as there is no method to ensure the assigned GPU resource is consistently shared between containers of the same pod.
Verifying Installation#
Run the following command to verify that all pods are running correctly:
microk8s kubectl get pods -n gpu-operator
You should see output similar to:
NAMESPACE NAME READY STATUS RESTARTS AGE
gpu-operator gpu-operator-node-feature-discovery-gc-76dc6664b8-npkdg 1/1 Running 0 77m
gpu-operator gpu-operator-node-feature-discovery-master-7d6b448f6d-76fqj 1/1 Running 0 77m
gpu-operator gpu-operator-node-feature-discovery-worker-8wr4n 1/1 Running 0 77m
gpu-operator gpu-operator-86656466d6-wjqf4 1/1 Running 0 77m
gpu-operator nvidia-container-toolkit-daemonset-qffh6 1/1 Running 0 77m
gpu-operator nvidia-dcgm-exporter-vcxsf 1/1 Running 0 77m
gpu-operator nvidia-cuda-validator-x9qn4 0/1 Completed 0 76m
gpu-operator nvidia-device-plugin-daemonset-t4j4k 1/1 Running 0 77m
gpu-operator gpu-feature-discovery-8dms9 1/1 Running 0 77m
gpu-operator nvidia-operator-validator-gjs9m 1/1 Running 0 77m
Once all pods are running, you can proceed to the Installation section.