Quickstart
A step-by-step guide on installing the zxporter (read-only) operator into your cluster.
Connect your Kubernetes Cluster
You can connect your Kubernetes cluster to the DevZero platform by deploying the zxporter operator. This lightweight, read-only component powers real-time cost insights and optimization recommendations — without modifying your workloads.
Log into the DevZero Console
After logging into the DevZero Console, click the "Connect new cluster" button in the "Clusters" section to begin the setup process.
Your K8s Provider
Choose the environment where your Kubernetes cluster is running. DevZero supports:
- Amazon EKS
- Google GKE
- Microsoft AKS
- Oracle OKE
- Other (self-managed or on-prem clusters)
After selecting your provider, copy the install command.
Install the operator
You’ll be provided a one-line script to deploy zxporter. Copy and run this script in a terminal with access to your Kubernetes cluster and kubectl configured.
Why not Helm? We have a Helm chart (we promise). But the quickstart is about getting to cost insights in minutes, not configuring values.yaml. The Helm chart will be waiting when you’re ready for production.
📘 Note: zxporter is fully read-only. It does not access secrets or modify cluster resources. You can inspect the manifest before applying it for full transparency.
Validating the connection
Once installed, DevZero will automatically detect and connect your cluster. Within a few minutes, you’ll start receiving real-time cost insights and workload optimization suggestions.
View dashboard
You’re now ready to explore the DevZero platform and improve your cluster’s efficiency.
Collect GPU metrics
zxporter Nodemon is a Kubernetes DaemonSet that collects GPU metrics directly from NVIDIA DCGM, enriches them with Kubernetes workload context (namespace, pod, container), and exposes per-container GPU metrics.
Before proceeding, confirm that GPU nodes are visible to Kubernetes and have allocatable capacity:
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.allocatable.nvidia\.com/gpu}{"\n"}{end}' \
| grep -v $'\t$'If no nodes appear, neither the NVIDIA GPU Operator nor the NVIDIA k8s-device-plugin is installed. Install one before continuing.
Source: devzero-inc/zxporter
Set the cloud provider
Create a values.yaml file. The provider field is required — there is no default:
provider: "gcp" # gcp | eks | azureFor EKS and Azure, if GPU nodes carry custom taints not covered by the default tolerations, add them under gpuMetricsExporter.tolerations in values.yaml.
Configure the DCGM collection mode
zxporter Nodemon connects to DCGM in one of two modes depending on whether a DCGM exporter is already running in your cluster.
Option A: DCGM exporter is already running
Disable the embedded sidecar and point Nodemon to your existing DCGM DaemonSet via label-based pod discovery. Find the labels on your DCGM pods:
DCGM_DS_INFO=$(kubectl get daemonset -A -o json \
| jq -r '.items[] | select(.metadata.name | contains("dcgm-exporter")) | "\(.metadata.namespace) \(.metadata.name)"' \
| head -1)
DCGM_NAMESPACE=$(echo $DCGM_DS_INFO | awk '{print $1}')
DCGM_DAEMONSET=$(echo $DCGM_DS_INFO | awk '{print $2}')
kubectl get daemonset -n $DCGM_NAMESPACE $DCGM_DAEMONSET -o json \
| jq -r '.spec.template.metadata.labels | to_entries[] | "\(.key)=\(.value)"'Common labels by provider:
| Cloud | Typical DCGM pod label |
|---|---|
| GCP (GKE managed) | app.kubernetes.io/name=gke-managed-dcgm-exporter |
| EKS / GPU Operator | app=nvidia-dcgm-exporter |
| Azure / GPU Operator | app=nvidia-dcgm-exporter |
Set the matching label(s) in values.yaml and disable the embedded sidecar:
dcgmExporter:
enabled: false
gpuMetricsExporter:
config:
DCGM_LABELS: "app=nvidia-dcgm-exporter"Alternative to zxporter Nodemon: If you cannot deploy Nodemon, patch the DaemonSet's pod template directly so Prometheus scrapes the existing DCGM pods. New pods inherit the annotation automatically; existing pods are updated via rolling restart:
kubectl get daemonset -A -o json | jq -r '.items[] | select(.metadata.name | contains("dcgm-exporter")) | "\(.metadata.namespace) \(.metadata.name)"' | while read namespace daemonset; do
kubectl patch daemonset $daemonset -n $namespace -p '{"spec":{"template":{"metadata":{"annotations":{"prometheus.io/scrape":"true"}}}}}'
doneOption B: No DCGM in the cluster
Enable the embedded DCGM exporter sidecar. First, check whether a host-level DCGM engine (nv-hostengine) is already running on your GPU nodes:
kubectl get nodes -o json \
| jq -r '.items[] | select(
.status.capacity["nvidia.com/gpu"] != null or
.metadata.labels["cloud.google.com/gke-accelerator"] != null or
.metadata.labels["k8s.amazonaws.com/accelerator"] != null
) | .metadata.name' \
| while read node; do
echo "Checking node: $node"
kubectl debug node/$node -it --image=alpine:3.19 --profile=sysadmin -- \
nsenter -t 1 -m -u -n -i -- sh -c \
'pgrep -x nv-hostengine >/dev/null 2>&1 \
&& echo "HOST ENGINE RUNNING — set useExternalHostEngine: true" \
|| echo "HOST ENGINE NOT FOUND — keep useExternalHostEngine: false"'
doneSet useExternalHostEngine in values.yaml based on the result:
dcgmExporter:
enabled: true
useExternalHostEngine: false # set to true if nv-hostengine is running on the hostInstall with Helm
helm install zxporter-nodemon oci://registry-1.docker.io/devzeroinc/zxporter-nodemon \
--version 0.0.2 \
--namespace devzero-system \
--create-namespace \
-f values.yamlOne-liner examples by cloud provider:
helm install zxporter-nodemon oci://registry-1.docker.io/devzeroinc/zxporter-nodemon \
--version 0.0.2 \
--namespace devzero-system --create-namespace \
--set provider=gcp \
--set dcgmExporter.enabled=false \
--set gpuMetricsExporter.config.DCGM_LABELS="app.kubernetes.io/name=gke-managed-dcgm-exporter"helm install zxporter-nodemon oci://registry-1.docker.io/devzeroinc/zxporter-nodemon \
--version 0.0.2 \
--namespace devzero-system --create-namespace \
--set provider=eks \
--set dcgmExporter.enabled=false \
--set gpuMetricsExporter.config.DCGM_LABELS="app=nvidia-dcgm-exporter"helm install zxporter-nodemon oci://registry-1.docker.io/devzeroinc/zxporter-nodemon \
--version 0.0.2 \
--namespace devzero-system --create-namespace \
--set provider=azure \
--set dcgmExporter.enabled=false \
--set gpuMetricsExporter.config.DCGM_LABELS="app=nvidia-dcgm-exporter"The full configuration reference is in the Helm values file, which serves as the source of truth with inline comments for each field.
GPU workloads on the dashboard
Go back to the dashboard and check out your GPU workloads!
Metrics collected
| Metric | Description |
|---|---|
gpu_utilization | GPU compute utilization (%) |
temperature | GPU temperature (°C) |
memory_temperature | Memory temperature (°C) |
power_usage | Power draw (W) |
framebuffer_used | GPU memory used (MiB) |
framebuffer_free | GPU memory free (MiB) |
framebuffer_total | Total GPU memory (MiB) |
mem_copy_util | Memory copy engine utilization (%) |
sm_clock | SM clock frequency (MHz) |
mem_clock | Memory clock frequency (MHz) |
xid_errors | XID error count |
power_violation | Power throttle time (ns) |
thermal_violation | Thermal throttle time (ns) |