keda karpenter autoscaling eks cost-optimization

KEDA and Karpenter Together — Pod and Node Scaling Synergy

by KubeDojo·May 6, 2026·11 min read·

KEDA and Karpenter Together — Pod and Node Scaling Synergy

HPA gets you basic pod autoscaling. Cluster Autoscaler gets you basic node scaling. Put them together and you have a reaction chain that works, but slowly: CPU crosses a threshold, HPA adds replicas, pods go Pending, CAS notices after a polling cycle, picks from a pre-defined node group, and spins up a VM. Minutes pass. For a web service handling steady traffic, that's fine. For an SQS queue that just received 50,000 messages, it's not.

KEDA and Karpenter replace both halves of that chain with faster, more flexible alternatives. KEDA watches your event source directly (queue depth, Prometheus metric, HTTP concurrency) and scales pods accordingly, including to and from zero. Karpenter watches for unschedulable pods and provisions right-sized nodes in seconds via the EC2 Fleet API, choosing from hundreds of instance types without pre-defined node groups.

This article covers how the two-layer scaling chain works, how to design NodePools for event-driven workloads, what scale-to-zero actually looks like when both layers participate, and a concrete SQS-based reference architecture you can adapt. The examples use AWS (EKS, SQS, EC2 Fleet API) because that's where the most mature reference architectures exist, but Karpenter's core scheduling logic is cloud-agnostic and Azure support is in active development.

The Two-Layer Scaling Chain

KEDA and Karpenter operate at different layers of the stack. KEDA is the pod layer: it watches external event sources, creates and manages an HPA, and scales Deployment replicas. Karpenter is the node layer: it watches for pods that can't be scheduled and provisions compute to run them.

The scale-up chain looks like this:

Event source spikes (queue fills, request rate climbs, metric crosses threshold)
KEDA's metrics adapter reports the new value to the HPA
HPA increases the Deployment's replica count
New pods appear as Pending because existing nodes lack capacity
Karpenter detects unschedulable pods, evaluates their resource requests and scheduling constraints
Karpenter launches optimal instance types via the cloud provider API
Nodes join the cluster, pods schedule, processing begins

The scale-down chain is the reverse. Events drain, KEDA reduces replicas, nodes become empty or underutilized, Karpenter's consolidation logic removes them.

Two-layer scaling chain: Event Source to KEDA to HPA to Pending Pods to Karpenter to New Nodes to Pods Scheduled Figure 1: The two-layer scaling chain — KEDA drives pod scaling from external events, unschedulable pods trigger Karpenter to provision right-sized nodes. The dashed return path shows scale-down: events drain, replicas reduce, and Karpenter consolidates empty nodes.

Why not HPA + Cluster Autoscaler?

Three reasons. First, CAS uses node groups with pre-defined instance types. You pick m5.xlarge and that's what you get, whether your burst needs 2 vCPUs or 16. Karpenter evaluates the actual pod requirements and selects from the full instance catalog. Second, CAS polls for unschedulable pods on a 10-30 second loop, then triggers ASG scaling, which launches a VM from a fixed instance type. Total time from Pending to schedulable: typically 3-5 minutes. Karpenter bypasses ASG entirely, using the EC2 Fleet API to launch optimal instances in 30-60 seconds. Third, neither HPA nor CAS can scale to zero. KEDA can remove every pod when the event source is idle, and Karpenter removes the now-empty nodes behind them.

Designing NodePools for Event-Driven Workloads

The key design pattern is separation: one NodePool for always-on services (API servers, databases, monitoring) on On-Demand instances, and another for event-driven burst workers on Spot instances. The Karpenter Blueprints repository demonstrates this split:

# karpenter-blueprints/blueprints/od-spot-split/od-spot.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: node-spot
spec:
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m
  limits:
    cpu: 1k
    memory: 500Gi
  template:
    metadata:
      labels:
        intent: apps
    spec:
      expireAfter: 168h0m0s
      nodeClassRef:
        group: karpenter.k8s.aws
        name: default
        kind: EC2NodeClass
      requirements:
      - key: capacity-spread
        operator: In
        values: ["2", "3", "4", "5"]
      - key: kubernetes.io/arch
        operator: In
        values: ["amd64"]
      - key: karpenter.sh/capacity-type
        operator: In
        values: ["spot"]
      - key: kubernetes.io/os
        operator: In
        values: ["linux"]
      - key: karpenter.k8s.aws/instance-category
        operator: In
        values: ["c", "m", "r"]
      - key: karpenter.k8s.aws/instance-generation
        operator: Gt
        values: ["2"]
      taints:
      - effect: NoSchedule
        key: intent
        value: workload-split

Several things to note. The consolidateAfter: 1m is aggressive: Karpenter will start removing underutilized nodes one minute after the last pod change. That's what you want for burst processing. After KEDA scales pods down, you don't want to pay for idle nodes for ten minutes while the default cooldown expires.

The limits section sets a hard cap. If the NodePool hits 1,000 CPUs, Karpenter stops provisioning regardless of how many pods KEDA is creating. Those pods stay Pending. Size your limits for the maximum burst you're willing to fund.

The taint (intent=workload-split:NoSchedule) ensures only pods that explicitly tolerate this taint land on Spot nodes. Your always-on services run on the On-Demand NodePool; burst workers get the cheaper, interruptible capacity.

Instance categories c, m, and r with generation > 2 give Karpenter a broad selection pool across compute-optimized, general-purpose, and memory-optimized families. This flexibility is critical for Spot availability: the more instance types Karpenter can choose from, the better the price and the lower the interruption rate.

Scale-to-Zero: The Full Idle Stack

KEDA's minReplicaCount: 0 is the headline feature for event-driven workloads. When the event source goes quiet, KEDA removes every consumer pod. With Karpenter's consolidation set to WhenEmptyOrUnderutilized, the empty worker nodes get terminated shortly after.

The result: zero pods, zero worker nodes, zero infrastructure cost during idle periods. For workloads with clear off-peak windows (batch processing that runs overnight, event pipelines that spike during business hours), the savings are significant.

The trade-off is cold-start latency. When events resume, the full provisioning chain activates:

Phase	Typical latency
KEDA detects events (polling interval)	15-30s
Karpenter provisions node	30-60s
Node joins cluster and becomes Ready	10-20s
Pod pulls image and starts	5-30s
Total cold start	~60-140s

For queue-based workloads, this is usually acceptable. Messages wait in the queue while capacity spins up. For latency-sensitive workloads, you have options:

Reduce KEDA's pollingInterval from the default 30s to 10-15s for faster event detection
Keep activationQueueLength at the default of 0 so KEDA activates on the first message. Setting it higher adds a noise filter for sporadic single-message arrivals, but also adds latency for low-volume queues.
Keep a small always-on node with overprovisioned pause pods. When real pods arrive, they evict the pause pods and schedule instantly while Karpenter provisions additional capacity in the background.

A Complete SQS Reference Architecture

Here's how all the pieces fit together for an SQS-based event processing pipeline. This pattern comes from the AWS reference architecture and the KEDA SQS scaler documentation.

The ScaledObject

# Adapted from KEDA SQS scaler docs (keda.sh/docs/2.19/scalers/aws-sqs)
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: keda-aws-credentials
  namespace: processing
spec:
  podIdentity:
    provider: aws
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sqs-consumer-scaler
  namespace: processing
spec:
  scaleTargetRef:
    name: sqs-consumer
  minReplicaCount: 0
  maxReplicaCount: 50
  cooldownPeriod: 60
  pollingInterval: 15
  triggers:
  - type: aws-sqs-queue
    authenticationRef:
      name: keda-aws-credentials
    metadata:
      queueURL: https://sqs.eu-west-1.amazonaws.com/123456789012/orders
      queueLength: "5"
      awsRegion: "eu-west-1"

The queueLength: "5" means KEDA targets 5 messages per pod. If the queue holds 250 messages, KEDA scales to 50 replicas (the maxReplicaCount ceiling). The cooldownPeriod: 60 prevents KEDA from scaling back to zero too quickly after a burst, giving the consumer pods time to drain remaining messages.

TriggerAuthentication with podIdentity.provider: aws uses IAM Roles for Service Accounts (IRSA) to authenticate against SQS. No access keys stored in Secrets.

The Consumer Deployment

# Adapted from karpenter-blueprints workload pattern for SQS consumer with scale-to-zero
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sqs-consumer
  namespace: processing
spec:
  replicas: 0
  selector:
    matchLabels:
      app: sqs-consumer
  template:
    metadata:
      labels:
        app: sqs-consumer
    spec:
      nodeSelector:
        intent: apps
      tolerations:
      - key: "intent"
        operator: "Equal"
        value: "workload-split"
        effect: "NoSchedule"
      containers:
      - name: consumer
        image: myregistry/sqs-consumer:latest
        resources:
          requests:
            cpu: 256m
            memory: 512Mi
          limits:
            memory: 512Mi

The Deployment starts with replicas: 0. KEDA owns the replica count. The nodeSelector and tolerations match the Spot NodePool's labels and taints, ensuring burst pods land on Spot capacity.

Resource requests are important here. Karpenter uses them to determine what instance types to provision. If you don't set requests, Karpenter assumes zero resources needed and may bin-pack pods onto undersized instances.

How the pieces connect

When messages arrive in SQS, the chain executes: KEDA polls the queue every 15 seconds, detects the backlog, and scales the Deployment from 0 to N replicas. The pods go Pending because no Spot nodes exist. Karpenter evaluates the aggregate resource requests (N pods x 256m CPU, 512Mi memory), selects instance types from the c/m/r families that fit, and launches Spot instances. Nodes join the cluster, pods schedule, and processing begins.

When the queue drains, KEDA scales pods back to zero after the 60-second cooldown. Karpenter detects the empty Spot nodes and terminates them within one minute (consolidateAfter: 1m). Back to zero cost.

Gotchas

Consolidation fighting KEDA oscillation. During fluctuating load, KEDA might scale down pods, Karpenter removes nodes, then KEDA scales back up and Karpenter has to provision again. Tune consolidateAfter to be slightly longer than your expected event oscillation period. For workloads with spiky, unpredictable patterns, 2-5 minutes is safer than 1 minute.

Spot interruptions during event processing. EC2 gives a 2-minute warning before reclaiming Spot instances. Karpenter handles this by pre-provisioning replacement nodes when it detects the interruption signal. For SQS workloads, this is mostly transparent: if a consumer pod is terminated, the message returns to the queue after the visibility timeout expires and another pod picks it up. For ScaledJobs processing one message per Job, set the karpenter.sh/do-not-disrupt: "true" annotation on long-running batch pods.

Mismatched timing parameters. KEDA's cooldownPeriod (default: 300s) and Karpenter's consolidateAfter (default: depends on consolidation policy) need to work together. If KEDA's cooldown is much shorter than Karpenter's consolidation delay, you'll have idle pods on idle nodes burning money. If it's much longer, you'll have empty nodes sitting around waiting for KEDA to finally scale down. Align them: a 60-second cooldown pairs well with a 1-minute consolidation delay for burst workloads.

NodePool limits are hard caps. If your NodePool's CPU limit is reached, Karpenter stops provisioning. Pods stay Pending even though KEDA keeps scaling the Deployment. You'll see pods stuck in Pending state with events showing "no nodepool is available." Size your NodePool limits for your expected peak, add headroom, and monitor the karpenter_nodepools_usage metrics.

maxReplicaCount vs. NodePool capacity. The ScaledObject's maxReplicaCount limits pods; the NodePool's spec.limits constrains infrastructure. Both need to be sized for the same burst scenario. If 50 pods need 256m CPU each, that's 12.8 CPUs. If your NodePool limit is 10 CPUs, you'll never run all 50 pods simultaneously.

Wrap-up

KEDA and Karpenter solve different halves of the same problem. KEDA makes pod scaling event-aware: scale on queue depth, not CPU utilization. Karpenter makes node scaling fast and flexible: provision the right compute in seconds, consolidate aggressively when it's no longer needed. Together, they create a fully reactive stack that goes from zero to peak capacity and back without manual intervention.

Start with a single event source (SQS, Kafka, Prometheus metric), a ScaledObject with minReplicaCount: 0, and a Spot NodePool with WhenEmptyOrUnderutilized consolidation. Tune the timing parameters (pollingInterval, cooldownPeriod, consolidateAfter) for your latency tolerance, then expand to additional event sources as the pattern proves out. The infrastructure cost during idle periods should be close to zero. The burst capacity should be limited only by what you're willing to fund.

keda karpenter autoscaling eks cost-optimization

This post is part of the KEDA — Event-Driven Autoscaling from Zero to Production collection (6 of 7)

← Batch Processing with KEDA ScaledJobs Observability and Troubleshooting for KEDA →

KubeDojo

Mastering the Kubernetes ecosystem — depth-first, no hype.