CNCF Certified Kubernetes AI Conformance Program

GPU scheduling works differently on every Kubernetes platform. DRA implementations vary. Inference routing is a patchwork of vendor-specific solutions. If you've tried moving an AI training job from one managed Kubernetes service to another, you know the drill: rewrite the resource requests, swap the device plugin, reconfigure the autoscaler, and hope the gang scheduling solution you picked is available on the target cluster.
The CNCF launched the Certified Kubernetes AI Conformance Program at KubeCon NA 2025 to fix exactly this. Modeled on the original Kubernetes conformance program that brought 100+ distributions to a common API baseline, this new initiative defines the minimum capabilities a platform must offer to reliably run AI and ML workloads. The goal: if your training job or inference service works on one conformant platform, it works on any of them.
This article walks through what the program requires, how the conformance checklist is structured, and what it means for teams choosing where to run production AI workloads.
What the Program Covers
The AI Conformance Program targets three workload categories:
- Training: distributed jobs that need accelerators, predictable scheduling, and all-or-nothing placement.
- Inference: model serving where latency, traffic routing, and autoscaling matter.
- Agentic workloads: multi-step workflows combining tools, memory, and long-running tasks.
Not every workload type uses every conformant feature. A single-GPU inference service doesn't need gang scheduling. A training job doesn't need Gateway API routing. The point is that a conformant platform supports the full surface area, so you don't discover missing capabilities after you've committed to a vendor.
The v1.35 checklist is strongest for training and inference. Agentic workloads, where long-running agents combine tools, memory, and iterative reasoning, are the newest pattern. The conformance requirements don't yet address their specific needs (persistent state across restarts, dynamic tool provisioning, long-lived WebSocket connections). The working group has flagged agentic workloads as a research area for v2.0, but for now, conformance mainly tells you whether training and inference will work reliably.
The Checklist
Conformance is validated through a YAML checklist, versioned per Kubernetes release. The latest is AIConformance-1.35.yaml. Each item has a level:
- MUST: mandatory for conformance. Fail one, fail the whole assessment.
- SHOULD: recommended but not blocking. Platforms are expected to implement these over time.
- N/A: allowed only with justification. "We don't support this feature" is not a valid reason. You need to explain why the requirement's context doesn't apply to your platform. Example: cluster autoscaling can be N/A for bare-metal on-premises deployments where node pools don't scale dynamically.
The checklist covers six domains with 12 total items. Here's the full breakdown.

The Six Requirement Domains
Accelerators
This is the heaviest section, with one MUST and three SHOULDs.
# AIConformance-1.35.yaml — accelerators section (descriptions trimmed)
spec:
accelerators:
- id: dra_support
description: "Support Dynamic Resource Allocation (DRA) APIs to enable
more flexible and fine-grained resource requests beyond simple counts."
level: MUST
- id: driver_runtime_management
description: "Provide a verifiable mechanism for ensuring that compatible
accelerator drivers and corresponding container runtime configurations
are correctly installed and maintained on nodes with accelerators.
..." # forward-looking DRA verification note trimmed
level: SHOULD
- id: gpu_sharing
description: "For accelerators that support static GPU sharing, provide
well-defined mechanisms for at least one GPU sharing strategy to
improve utilization for workloads that do not require a full
dedicated GPU. ..." # hardware partitioning, time-slicing, and
# forward-looking DRA dynamic sharing notes trimmed
level: SHOULD
- id: virtualized_accelerator
description: "For accelerators that support virtualized accelerator
technologies (e.g. vGPU), provide well-defined mechanisms for these
to be exposed and managed. ..." # consistency and DRA notes trimmed
level: SHOULD
DRA is the only MUST here. The old device plugin model (nvidia.com/gpu: 1) treats GPUs as opaque integers. DRA, stable since Kubernetes 1.34, works more like PersistentVolumeClaim: you define resource classes, claim specific device capabilities, and let the scheduler handle placement. This matters for multi-GPU training jobs where topology awareness (which GPUs share an NVLink bus) can significantly affect throughput.
The three SHOULD items reflect where the ecosystem is heading. GPU sharing (MIG partitioning, time-slicing, MPS) and vGPU support aren't mandatory yet, but the checklist is clear that platforms should expose fractional GPU resources as schedulable units once the hardware supports it through DRA. Watch the SHOULD items closely. They signal what becomes MUST in v2.0. If your platform doesn't support GPU sharing today, it will likely need to by the next certification cycle.
Networking
# AIConformance-1.35.yaml — networking section
spec:
networking:
- id: ai_inference
description: "Support the Kubernetes Gateway API with an implementation
for advanced traffic management for inference services, which enables
capabilities like weighted traffic splitting, header-based routing
(for OpenAI protocol headers), and optional integration with service meshes."
level: MUST
One item, and it's a MUST. Inference services need more than basic Service load balancing. The requirement specifically calls out Gateway API support with weighted traffic splitting (for canary deployments of model versions) and header-based routing for OpenAI protocol headers. This is how you route requests to different model backends based on the model field in the OpenAI-compatible API request.
This is the requirement most tightly coupled to a specific ecosystem choice. Gateway API is the community's answer to Ingress, but inference routing is still evolving fast. Projects like KGateway (formerly Envoy Gateway with inference extensions) and Kubermatic's KubeLB AI Gateway are building on top of Gateway API, but the implementations vary significantly. Ask your vendor which Gateway API controller they certified with, and whether it supports the inference-specific features or just the base spec.
Scheduling and Orchestration
# AIConformance-1.35.yaml — scheduling section (descriptions trimmed)
spec:
schedulingOrchestration:
- id: gang_scheduling
description: "The platform must allow for the installation and successful
operation of at least one gang scheduling solution that ensures
all-or-nothing scheduling for distributed AI workloads (e.g. Kueue,
Volcano, etc.) ..." # vendor demonstration requirement trimmed
level: MUST
- id: cluster_autoscaling
description: "If the platform provides a cluster autoscaler or an
equivalent mechanism, it must be able to scale up/down node groups
containing specific accelerator types based on pending pods requesting
those accelerators."
level: MUST
- id: pod_autoscaling
description: "If the platform supports the HorizontalPodAutoscaler,
it must function correctly for pods utilizing accelerators.
..." # custom metrics requirement trimmed
level: MUST
Three MUSTs. Gang scheduling prevents the classic distributed training deadlock: Job A grabs 4 of 8 required GPUs, Job B grabs the other 4, and both sit forever waiting for resources that will never free up. The checklist requires at least one solution (Kueue, Volcano, or equivalent) that guarantees all-or-nothing placement.
The autoscaling requirements are conditional: "if the platform provides" a cluster autoscaler or HPA. But the bar is specific. The cluster autoscaler must handle accelerator-typed node groups, not just generic CPU/memory pools. HPA must work with custom metrics relevant to AI workloads (GPU utilization, queue depth, tokens per second), not just CPU percentage.
Gang scheduling is the requirement most likely to expose differences between conformant platforms. Kueue is the community standard, but it's maturing fast. Features like MultiKueue (cross-cluster job dispatching) and ProvisioningRequest (just-in-time node provisioning) shipped recently. If your vendor certified with an older Kueue release, you may have gang scheduling on paper but miss the features that make it production-ready for large-scale training. Ask which version they tested against.
Observability
# AIConformance-1.35.yaml — observability section (descriptions trimmed)
spec:
observability:
- id: accelerator_metrics
description: "For supported accelerator types, the platform must allow
for the installation and successful operation of at least one
accelerator metrics solution that exposes fine-grained performance
metrics via a standardized, machine-readable metrics endpoint.
..." # core metrics list, OTel alignment, and managed solution
# opt-out notes trimmed
level: MUST
- id: ai_service_metrics
description: "Provide a monitoring system capable of discovering and
collecting metrics from workloads that expose them in a standard
format (e.g. Prometheus exposition format).
..." # framework integration note trimmed
level: MUST
Two MUSTs. The full accelerator_metrics description requires per-device utilization and memory usage at minimum, plus temperature, power draw, and interconnect bandwidth where the hardware exposes them. The checklist recommends alignment with emerging standards like OpenTelemetry for interoperability, though it stops short of mandating a specific format.
The second item requires the platform to discover and collect metrics from workloads that expose Prometheus-format endpoints. This isn't about shipping a specific monitoring stack. It's about ensuring your vLLM server's /metrics endpoint gets scraped without custom integration work on each platform.
Security
# AIConformance-1.35.yaml — security section
spec:
security:
- id: secure_accelerator_access
description: "Ensure that access to accelerators from within containers
is properly isolated and mediated by the Kubernetes resource management
framework (device plugin or DRA) and container runtime, preventing
unauthorized access or interference between workloads."
level: MUST
One MUST. GPUs are shared resources in multi-tenant clusters. Without proper isolation, a container could access another workload's GPU memory, which is a data exfiltration risk. The requirement is straightforward: accelerator access must go through the Kubernetes resource management framework (device plugin or DRA), and the container runtime must enforce isolation between workloads.
Operators
# AIConformance-1.35.yaml — operator section
spec:
operator:
- id: robust_controller
description: "The platform must prove that at least one complex AI
operator with a CRD (e.g., Ray, Kubeflow) can be installed and
functions reliably. This includes verifying that the operator's pods
run correctly, its webhooks are operational, and its custom resources
can be reconciled."
level: MUST
One MUST. This is a practical sanity check. AI frameworks like Ray and Kubeflow are complex operators that stress the API server, webhook infrastructure, and CRD reconciliation loop. If a platform can't reliably run these operators, the rest of the conformance checklist is academic. The requirement asks vendors to demonstrate that at least one such operator installs cleanly, its webhooks fire, and custom resources reconcile.
Who's Already Certified
The initial v1.0 certifications were announced at KubeCon NA 2025 in Atlanta. The roster includes:
| Category | Platforms |
|---|---|
| Major cloud | AWS EKS, Google GKE, Microsoft Azure AKS, Oracle OCI |
| European / sovereign | Kubermatic KKP, SUSE, Gardener |
| AI-native infrastructure | CoreWeave, Akamai |
| Open source | Red Hat OpenShift, Sidero Labs (Talos), VMware VKS |
Checklists are available for Kubernetes v1.33, v1.34, and v1.35. Each platform certifies per-product and per-configuration. A cloud deployment and an air-gapped deployment of the same product are separate certifications.
The breadth of initial participants matters. It's not just the hyperscalers checking a compliance box. European vendors like Kubermatic explicitly frame conformance as a digital sovereignty play: "The future of AI will be built on open standards, not walled gardens." CoreWeave and Akamai represent the GPU-native infrastructure layer that didn't exist in the original K8s conformance era.
Gotchas
Self-assessment today, automated tests later. The v1.0 process is a structured self-assessment: fill out the YAML checklist, provide links to public documentation as evidence, and submit a pull request. CNCF reviews submissions within 10 business days. Automated conformance tests are planned for 2026, which will raise the bar significantly.
N/A is not a free pass. Vendors can mark a requirement as "Not Applicable," but the justification matters. "We don't support this feature" gets rejected. A valid justification explains why the requirement's context doesn't apply: bare-metal platforms can justify N/A for cluster autoscaling because there are no dynamic node pools to scale. Misunderstanding this distinction is the fastest way to get a submission sent back.
Annual renewal, aligned with Kubernetes releases. Certifications are valid for one year and must be renewed with each Kubernetes minor version. A platform certified for v1.34 needs to re-certify for v1.35. This keeps the program aligned with the Kubernetes release cycle and prevents stale certifications from lingering.
v2.0 will expand scope. The current checklist is deliberately focused. The v2.0 roadmap, expected in 2026, will likely add requirements for advanced inference patterns, enhanced monitoring metrics, and stricter security for model serving. If you're building conformance into your platform strategy, plan for the requirements to grow.
Conformance doesn't mean identical. Two conformant platforms can implement requirements differently. GKE enables DRA by default and ships Gateway API as a managed feature. Kubermatic bundles Kueue in its default application catalog and integrates the KubeLB AI Gateway. Both are conformant, but the operational experience differs. Conformance guarantees the capabilities exist, not that they're configured identically.
Wrap-up
Twelve requirements across six domains. That's the current bar for Kubernetes AI Conformance: DRA, gang scheduling, Gateway API routing, accelerator metrics, isolated GPU access, and operator reliability.
The program is the community's bet that open standards will beat proprietary AI infrastructure stacks, and the initial roster of certified platforms (hyperscalers, European vendors, GPU-native providers) suggests the bet is landing. With 82% of organizations building custom AI solutions and 58% already on Kubernetes, the fragmentation cost is real enough that vendors are willing to certify.
Before committing to a platform for production AI workloads, check its conformance status at github.com/cncf/k8s-ai-conformance. If it's not on the list, ask why. The checklist is public, the bar is clearly defined, and there's no technical reason a production-grade Kubernetes platform can't meet it.
Mastering the Kubernetes ecosystem — depth-first, no hype.
Subscribe to KubeDojo
Get the latest articles delivered to your inbox.
Related Articles

NVIDIA AI Cluster Runtime: Validated GPU Kubernetes Recipes
NVIDIA released AI Cluster Runtime, an open-source project providing validated, version-locked Kubernetes configurations for GPU infrastructure.

llm-d Joins CNCF Sandbox: Kubernetes-Native Distributed LLM Inference
llm-d was accepted as a CNCF Sandbox project, providing Kubernetes-native distributed inference with KV-cache-aware routing, prefill/decode disaggregation, and accelerator-agnostic serving.

Introduction to KEDA and Event-Driven Autoscaling
How KEDA extends Kubernetes HPA with 65+ scalers, scale-to-zero, and a two-phase architecture for event-driven pod autoscaling.