A step-by-step guide on installing the zxporter (read-only) operator into your cluster.

Connect your Kubernetes Cluster

You can connect your Kubernetes cluster to the DevZero platform by deploying the zxporter operator. This lightweight, read-only component powers real-time cost insights and optimization recommendations — without modifying your workloads.

Log into the DevZero Console

After logging into the DevZero Console, click the "Connect new cluster" button in the "Clusters" section to begin the setup process.

Your K8s Provider

Choose the environment where your Kubernetes cluster is running. DevZero supports:

Amazon EKS
Google GKE
Microsoft AKS
Oracle OKE
Other (self-managed or on-prem clusters)

After selecting your provider, copy the install command.

Install the operator

You’ll be provided a one-line script to deploy zxporter. Copy and run this script in a terminal with access to your Kubernetes cluster and kubectl configured.

Why not Helm? We have a Helm chart (we promise). But the quickstart is about getting to cost insights in minutes, not configuring values.yaml. The Helm chart will be waiting when you’re ready for production.

📘 Note: zxporter is fully read-only. It does not access secrets or modify cluster resources. You can inspect the manifest before applying it for full transparency.

Validating the connection

Once installed, DevZero will automatically detect and connect your cluster. Within a few minutes, you’ll start receiving real-time cost insights and workload optimization suggestions.

View dashboard

You’re now ready to explore the DevZero platform and improve your cluster’s efficiency.

Collect GPU metrics

zxporter Nodemon is a Kubernetes DaemonSet that collects GPU metrics directly from NVIDIA DCGM, enriches them with Kubernetes workload context (namespace, pod, container), and exposes per-container GPU metrics.

Before proceeding, confirm that GPU nodes are visible to Kubernetes and have allocatable capacity:

kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.allocatable.nvidia\.com/gpu}{"\n"}{end}' \
  | grep -v $'\t$'

If no nodes appear, neither the NVIDIA GPU Operator nor the NVIDIA k8s-device-plugin is installed. Install one before continuing.

Source: devzero-inc/zxporter

Set the cloud provider

Create a values.yaml file. The provider field is required — there is no default:

provider: "gcp"  # gcp | eks | azure

For EKS and Azure, if GPU nodes carry custom taints not covered by the default tolerations, add them under gpuMetricsExporter.tolerations in values.yaml.

Configure the DCGM collection mode

zxporter Nodemon connects to DCGM in one of two modes depending on whether a DCGM exporter is already running in your cluster.

Option A: DCGM exporter is already running

Disable the embedded sidecar and point Nodemon to your existing DCGM DaemonSet via label-based pod discovery. Find the labels on your DCGM pods:

DCGM_DS_INFO=$(kubectl get daemonset -A -o json \
  | jq -r '.items[] | select(.metadata.name | contains("dcgm-exporter")) | "\(.metadata.namespace) \(.metadata.name)"' \
  | head -1)
DCGM_NAMESPACE=$(echo $DCGM_DS_INFO | awk '{print $1}')
DCGM_DAEMONSET=$(echo $DCGM_DS_INFO | awk '{print $2}')
kubectl get daemonset -n $DCGM_NAMESPACE $DCGM_DAEMONSET -o json \
  | jq -r '.spec.template.metadata.labels | to_entries[] | "\(.key)=\(.value)"'

Common labels by provider:

Cloud	Typical DCGM pod label
GCP (GKE managed)	`app.kubernetes.io/name=gke-managed-dcgm-exporter`
EKS / GPU Operator	`app=nvidia-dcgm-exporter`
Azure / GPU Operator	`app=nvidia-dcgm-exporter`

Set the matching label(s) in values.yaml and disable the embedded sidecar:

dcgmExporter:
  enabled: false

gpuMetricsExporter:
  config:
    DCGM_LABELS: "app=nvidia-dcgm-exporter"

Alternative to zxporter Nodemon: If you cannot deploy Nodemon, patch the DaemonSet's pod template directly so Prometheus scrapes the existing DCGM pods. New pods inherit the annotation automatically; existing pods are updated via rolling restart:

kubectl get daemonset -A -o json | jq -r '.items[] | select(.metadata.name | contains("dcgm-exporter")) | "\(.metadata.namespace) \(.metadata.name)"' | while read namespace daemonset; do
  kubectl patch daemonset $daemonset -n $namespace -p '{"spec":{"template":{"metadata":{"annotations":{"prometheus.io/scrape":"true"}}}}}'
done

Option B: No DCGM in the cluster

Enable the embedded DCGM exporter sidecar. First, check whether a host-level DCGM engine (nv-hostengine) is already running on your GPU nodes:

kubectl get nodes -o json \
  | jq -r '.items[] | select(
      .status.capacity["nvidia.com/gpu"] != null or
      .metadata.labels["cloud.google.com/gke-accelerator"] != null or
      .metadata.labels["k8s.amazonaws.com/accelerator"] != null
    ) | .metadata.name' \
  | while read node; do
      echo "Checking node: $node"
      kubectl debug node/$node -it --image=alpine:3.19 --profile=sysadmin -- \
        nsenter -t 1 -m -u -n -i -- sh -c \
        'pgrep -x nv-hostengine >/dev/null 2>&1 \
          && echo "HOST ENGINE RUNNING — set useExternalHostEngine: true" \
          || echo "HOST ENGINE NOT FOUND — keep useExternalHostEngine: false"'
    done

Set useExternalHostEngine in values.yaml based on the result:

dcgmExporter:
  enabled: true
  useExternalHostEngine: false  # set to true if nv-hostengine is running on the host

Install with Helm

helm install zxporter-nodemon oci://registry-1.docker.io/devzeroinc/zxporter-nodemon \
  --version 0.0.2 \
  --namespace devzero-system \
  --create-namespace \
  -f values.yaml

One-liner examples by cloud provider:

helm install zxporter-nodemon oci://registry-1.docker.io/devzeroinc/zxporter-nodemon \
  --version 0.0.2 \
  --namespace devzero-system --create-namespace \
  --set provider=gcp \
  --set dcgmExporter.enabled=false \
  --set gpuMetricsExporter.config.DCGM_LABELS="app.kubernetes.io/name=gke-managed-dcgm-exporter"

helm install zxporter-nodemon oci://registry-1.docker.io/devzeroinc/zxporter-nodemon \
  --version 0.0.2 \
  --namespace devzero-system --create-namespace \
  --set provider=eks \
  --set dcgmExporter.enabled=false \
  --set gpuMetricsExporter.config.DCGM_LABELS="app=nvidia-dcgm-exporter"

helm install zxporter-nodemon oci://registry-1.docker.io/devzeroinc/zxporter-nodemon \
  --version 0.0.2 \
  --namespace devzero-system --create-namespace \
  --set provider=azure \
  --set dcgmExporter.enabled=false \
  --set gpuMetricsExporter.config.DCGM_LABELS="app=nvidia-dcgm-exporter"

The full configuration reference is in the Helm values file, which serves as the source of truth with inline comments for each field.

GPU workloads on the dashboard

Go back to the dashboard and check out your GPU workloads!

Metrics collected

Metric	Description
`gpu_utilization`	GPU compute utilization (%)
`temperature`	GPU temperature (°C)
`memory_temperature`	Memory temperature (°C)
`power_usage`	Power draw (W)
`framebuffer_used`	GPU memory used (MiB)
`framebuffer_free`	GPU memory free (MiB)
`framebuffer_total`	Total GPU memory (MiB)
`mem_copy_util`	Memory copy engine utilization (%)
`sm_clock`	SM clock frequency (MHz)
`mem_clock`	Memory clock frequency (MHz)
`xid_errors`	XID error count
`power_violation`	Power throttle time (ns)
`thermal_violation`	Thermal throttle time (ns)

Quickstart