keda kubernetes autoscaling event-driven

Introduction to KEDA and Event-Driven Autoscaling

by KubeDojo·May 6, 2026·12 min read·

Introduction to KEDA and Event-Driven Autoscaling

Kubernetes HPA scales on CPU and memory. Most production workloads care about something else entirely: messages in a queue, HTTP requests per second, rows in a database table, or a custom Prometheus metric that reflects actual business load. Bridging that gap used to mean deploying a custom metrics adapter, wiring it to your event source, and maintaining the plumbing yourself.

KEDA eliminates that plumbing. It is a lightweight, CNCF-graduated component that connects Kubernetes autoscaling to 65+ external event sources out of the box. It does not replace HPA. It extends it with a clean two-phase architecture: KEDA handles activation (scaling from and to zero), then hands off to the native HPA for replica scaling. The result is event-driven autoscaling that works with Deployments, StatefulSets, Jobs, and custom resources, on any Kubernetes cluster.

This article covers KEDA's architecture, its two-phase scaling model, the four CRDs that define scaling behavior, and the scaler landscape. By the end, you will understand how KEDA fits alongside HPA, when to use scale-to-zero, and what the gotchas are before you deploy your first ScaledObject.

Why HPA Isn't Enough

HPA scales on CPU and memory out of the box. Anything else, whether it is Kafka consumer lag, SQS queue depth, or a Prometheus query, requires a custom metrics adapter that implements the custom.metrics.k8s.io or external.metrics.k8s.io API. Each adapter supports a limited set of sources (Prometheus Adapter only speaks PromQL, the Datadog Cluster Agent only speaks Datadog), has its own configuration model, and none of them support scale-to-zero. HPA's minReplicas floor is 1.

KEDA solves all three problems: 65+ built-in scalers, a declarative CRD-based configuration, and scale-to-zero as a first-class feature. It does not replace HPA or any existing metrics pipeline. It installs alongside them and only manages the resources you explicitly target with a ScaledObject.

KEDA Architecture: The Two-Phase Model

KEDA installs three components into your cluster:

KEDA Operator (keda-operator): watches ScaledObject and ScaledJob resources, manages the 0↔1 scaling transitions, and creates/owns HPA objects for each ScaledObject.
KEDA Metrics Server (keda-operator-metrics-apiserver): implements the external.metrics.k8s.io API. When HPA asks "what is the current value of this metric?", the metrics server queries the configured scaler and returns the answer.
Admission Webhooks: validate ScaledObject and ScaledJob configurations at creation time, preventing issues like two ScaledObjects targeting the same Deployment.

The key insight is the two-phase scaling model:

Phase 1 — Activation (0↔1): The KEDA operator polls each scaler at the configured pollingInterval (default: 30 seconds). If the scaler reports that the event source is active (messages in the queue, metrics above the activation threshold), the operator scales the target from 0 to 1 replica. When the source goes quiet and the cooldownPeriod elapses, the operator scales back to 0. This phase is entirely managed by KEDA, not HPA.

Phase 2 — Scaling (1↔N): Once the target has at least 1 replica, the HPA takes over. KEDA's metrics server feeds the external metric values to HPA, which applies its standard scaling algorithm (target average value, stabilization windows, scaling policies) to decide the replica count between minReplicaCount and maxReplicaCount. This is native Kubernetes autoscaling, just with a richer set of metrics.

This separation is deliberate. HPA was never designed to handle the 0↔1 transition (it requires at least 1 replica to compute per-pod averages). KEDA fills that gap without reimplementing HPA's scaling logic.

KEDA Two-Phase Scaling Architecture

Figure 1 — KEDA's two-phase architecture: the operator handles 0↔1 activation while HPA handles 1↔N scaling, both driven by ScaledObject configuration.

Here is how it looks in practice with a Prometheus-based trigger:

# ScaledObject — Prometheus trigger (adapted from keda.sh/docs/2.19/scalers/prometheus)
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: web-app-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: web-app                    # Deployment to scale
  minReplicaCount: 0                 # Enable scale-to-zero
  maxReplicaCount: 20
  pollingInterval: 15                # Check every 15 seconds
  cooldownPeriod: 300                # Wait 5 minutes before scaling to zero
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus.monitoring:9090
      threshold: '100'               # HPA target: 100 req/s per pod
      activationThreshold: '5'       # Activate when rate > 5 req/s
      query: sum(rate(http_requests_total{deployment="web-app"}[2m]))

When http_requests_total rate exceeds 5 req/s, KEDA activates the Deployment (0→1). HPA then scales to as many replicas as needed to keep each pod at roughly 100 req/s. When traffic drops and stays below the activation threshold for 300 seconds, KEDA scales back to zero.

The Four CRDs

KEDA introduces four Custom Resource Definitions:

ScaledObject

The primary resource. It links a Deployment, StatefulSet, or any custom resource with a /scale subresource to one or more event triggers. Key fields:

Field	Purpose	Default
`scaleTargetRef.name`	Target workload name	Required
`pollingInterval`	How often KEDA checks the scaler (seconds)	30
`cooldownPeriod`	Idle time before scaling to zero (seconds)	300
`minReplicaCount`	Minimum replicas (set to 0 for scale-to-zero)	0
`maxReplicaCount`	Maximum replicas	100
`triggers[]`	List of event source triggers	Required

A ScaledObject can have multiple triggers. KEDA evaluates all of them and uses the highest recommended replica count. This means you can scale on both queue depth and CPU utilization simultaneously.

ScaledJob

The batch-processing counterpart. Instead of scaling a long-running Deployment, ScaledJob creates one Kubernetes Job per event. Each Job pulls a single message, processes it to completion, and terminates. This is ideal for long-running tasks where you do not want HPA to terminate a pod mid-processing.

# ScaledJob — RabbitMQ trigger (adapted from keda.sh/docs/2.19/concepts/scaling-jobs)
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: order-processor
  namespace: default
spec:
  jobTargetRef:
    template:
      spec:
        containers:
        - name: processor
          image: order-processor:latest
          command: ["process", "--single-message"]
        restartPolicy: Never
    backoffLimit: 3
  pollingInterval: 10
  maxReplicaCount: 30
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 3
  triggers:
  - type: rabbitmq
    metadata:
      queueName: orders
      host: amqp://rabbitmq.default:5672/
      mode: QueueLength
      value: '1'

ScaledJobs are covered in depth separately.

TriggerAuthentication and ClusterTriggerAuthentication

These resources decouple credentials from ScaledObject definitions. Instead of embedding secrets in trigger metadata, you reference a TriggerAuthentication that pulls credentials from Kubernetes Secrets, environment variables, pod identity providers (AWS IRSA, Azure Workload Identity, GCP Workload Identity), or HashiCorp Vault.

# TriggerAuthentication + ScaledObject (adapted from keda.sh/docs/2.19/scalers/aws-sqs)
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: aws-credentials
  namespace: default
spec:
  podIdentity:
    provider: aws
---
# ScaledObject referencing the TriggerAuthentication
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sqs-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: queue-consumer
  triggers:
  - type: aws-sqs-queue
    authenticationRef:
      name: aws-credentials
    metadata:
      queueURL: https://sqs.eu-west-1.amazonaws.com/123456789/orders
      queueLength: '5'
      awsRegion: eu-west-1

ClusterTriggerAuthentication is the cluster-scoped variant. It lets you define credentials once and reference them from ScaledObjects in any namespace.

The Scaler Landscape

KEDA's scaler catalog is what makes it practical. Each scaler knows how to query a specific event source and return a metric value. As of KEDA 2.19, there are 65+ built-in scalers organized by category:

Category	Examples
Messaging	Apache Kafka, AWS SQS, RabbitMQ, Azure Service Bus, NATS JetStream, GCP Pub/Sub
Metrics	Prometheus, Datadog, New Relic, Dynatrace, InfluxDB
Cloud services	AWS CloudWatch, Azure Monitor, GCP Cloud Tasks
Databases	PostgreSQL, MySQL, MongoDB, Redis, Cassandra
Scheduling	Cron (time-based scaling)
Kubernetes	Kubernetes workload metrics, resource metrics
HTTP	HTTP Add-on (separate component for request-based scaling)

Each scaler exposes two key thresholds:

threshold (or scaler-specific equivalent like queueLength): the target value for HPA-driven scaling (1↔N). If threshold: 10 and the queue has 50 messages, HPA targets 5 replicas.
activationThreshold (or activation + parameter name): the value that triggers the 0↔1 transition. Default is 0, meaning any non-zero metric activates the scaler.

These thresholds are independent. You can set activationThreshold: 50 and threshold: 10, which means the scaler stays inactive (zero replicas) until the metric exceeds 50, then HPA targets 10 per pod once active. This decoupling is useful when you want to absorb small bursts without spinning up pods, but scale aggressively once the threshold is crossed.

If none of the built-in scalers fit your event source, KEDA supports external scalers via a gRPC interface. You implement a small gRPC server that returns metric values, and KEDA calls it like any other scaler.

Scale-to-Zero: How It Works

Scale-to-zero is KEDA's headline feature and the primary reason teams adopt it over raw HPA with a custom metrics adapter. The mechanism is straightforward:

Set minReplicaCount: 0 on the ScaledObject
When all scalers report inactive (metric at or below activationThreshold), KEDA starts the cooldown timer
After cooldownPeriod seconds (default: 300), the operator scales the target to 0 replicas
When a scaler detects activity again, the operator immediately scales to 1

The cooldownPeriod prevents flapping. A brief spike that subsides within 5 minutes will not trigger a scale-to-zero and subsequent cold start. You can also set initialCooldownPeriod to prevent scaling to zero for a period after the ScaledObject is first created, giving your application time to warm up.

The trade-off is cold start latency. When the first event arrives after a scale-to-zero period, the pod must be scheduled, pulled (if not cached), started, and pass readiness checks before it can process the event. For a typical container, this is 5-30 seconds. For workloads with large images or slow startup, it can be longer.

When to use scale-to-zero:

Dev and staging environments where idle workloads waste resources
Batch processors that only run when work arrives (overnight ETL, report generation)
Multi-tenant platforms where hundreds of low-traffic workloads would otherwise consume a base allocation

When to keep minReplicaCount: 1:

Latency-sensitive services where cold starts are unacceptable
Workloads with expensive initialization (ML model loading, large cache warming)
Services behind health checks that must always respond (load balancer targets)

Gotchas

KEDA owns the HPA. When you create a ScaledObject, KEDA creates a corresponding HPA object. If you already have an HPA targeting the same Deployment, they will conflict. Use the scaledobject.keda.sh/transfer-hpa-ownership: "true" annotation and set advanced.horizontalPodAutoscalerConfig.name to your existing HPA name to let KEDA take over gracefully.

Activation and scaling thresholds are independent. Setting threshold: 10 and activationThreshold: 50 means the scaler stays inactive at 40 messages, even though HPA would want 4 replicas. This is by design, but it catches people who expect the two values to work together.

cooldownPeriod only applies to the 1→0 transition. It is not the same as HPA's stabilizationWindowSeconds, which controls scale-down from N to a lower N. If you need to slow down HPA scale-down behavior, configure it separately via advanced.horizontalPodAutoscalerConfig.behavior.

One ScaledObject per target. The admission webhook rejects a second ScaledObject targeting the same Deployment. If you need multiple triggers, add them all to a single ScaledObject's triggers[] array.

pollingInterval is a latency-throughput trade-off. Lower intervals detect events faster but increase the number of queries against your event source. For a high-traffic Kafka cluster this is fine. For a rate-limited cloud API, set it higher.

activationThreshold uses strict greater-than. With the default activationThreshold: 0, the scaler activates when the metric value is 1 or more, not 0. A metric value of exactly 0 keeps the scaler inactive.

Wrap-up

KEDA bridges the gap between Kubernetes HPA and the external event sources that drive most real workloads. Its two-phase architecture keeps the design clean: KEDA handles activation and deactivation, HPA handles replica scaling. The four CRDs (ScaledObject, ScaledJob, TriggerAuthentication, ClusterTriggerAuthentication) give you a declarative, portable way to define event-driven scaling on any cluster.

The scaler catalog is where KEDA earns its keep. Instead of deploying and maintaining separate metrics adapters for each event source, you get 65+ scalers out of the box with a consistent configuration model. Combined with scale-to-zero and secure credential management via TriggerAuthentication, KEDA makes event-driven autoscaling accessible without the infrastructure overhead.

To get started, install KEDA via Helm (helm install keda kedacore/keda --namespace keda --create-namespace) and deploy a ScaledObject against a Prometheus metric you already export. Start with minReplicaCount: 1 until you understand the activation behavior, then move to scale-to-zero on non-critical workloads. The gotchas above will save you a few hours of debugging.

keda kubernetes autoscaling event-driven

This post is part of the KEDA — Event-Driven Autoscaling from Zero to Production collection (1 of 7)

Message Queue Scaling with KEDA — Kafka and SQS →

KubeDojo

Mastering the Kubernetes ecosystem — depth-first, no hype.

Subscribe to KubeDojo

Get the latest articles delivered to your inbox.

keda prometheus custom-metrics

Custom Metrics and Prometheus-Based Scaling with KEDA

Using KEDA's Prometheus scaler to drive autoscaling from any PromQL query — replacing Prometheus Adapter with a simpler, more flexible approach.

by KubeDojo

keda http autoscaling

HTTP-Based Autoscaling with the KEDA HTTP Add-on

How the KEDA HTTP Add-on intercepts traffic to scale HTTP workloads to zero, and when the Prometheus scaler is better.

by KubeDojo

keda karpenter autoscaling

KEDA and Karpenter Together — Pod and Node Scaling Synergy

Combining KEDA's event-driven pod scaling with Karpenter's just-in-time node provisioning for a fully reactive, cost-efficient Kubernetes autoscaling stack.

by KubeDojo

Why HPA Isn't Enough

KEDA Architecture: The Two-Phase Model

The Four CRDs

ScaledObject

ScaledJob

TriggerAuthentication and ClusterTriggerAuthentication

The Scaler Landscape

Scale-to-Zero: How It Works

Gotchas

Wrap-up

Subscribe to KubeDojo

Related Articles

Custom Metrics and Prometheus-Based Scaling with KEDA

HTTP-Based Autoscaling with the KEDA HTTP Add-on

KEDA and Karpenter Together — Pod and Node Scaling Synergy