KubeDojo

Message Queue Scaling with KEDA — Kafka and SQS

KubeDojo
by KubeDojo·15 min read·
Message Queue Scaling with KEDA — Kafka and SQS

Your Kafka consumer group lag is climbing. Messages pile up in your SQS dead-letter queue. HPA can't see either signal because it only knows about CPU and memory. KEDA fixes this by exposing Kafka consumer lag and SQS queue depth as HPA metrics, but the two scalers work differently under the hood. The Kafka scaler tracks consumer group lag across partitions. The SQS scaler counts approximate messages in the queue. Both feed into the same two-phase model (KEDA operator handles 0↔1, HPA handles 1↔N), but tuning them requires understanding what each scaler actually measures and how the numbers translate to replica counts.

What follows: how KEDA calculates lag and queue depth from the source code, the parameters that control scaling behavior, TriggerAuthentication patterns for secure connections, and the edge cases that catch teams in production.

KEDA two-phase scaling model for Kafka and SQS Figure 1: KEDA's two-phase model applied to Kafka (lag-based) and SQS (depth-based) scaling. The KEDA operator handles 0↔1 activation, HPA handles 1↔N scaling.

The Kafka Scaler: Consumer Lag as a Scaling Signal

The Kafka scaler uses consumer group lag as its metric. KEDA queries the Kafka brokers for two things: the latest offset on each partition (where the producer is) and the committed offset for the consumer group (where the consumer is). The difference is lag.

The getTotalLag() function in kafka_scaler.go sums latestOffset - consumerOffset across all partitions in the topic. That total lag becomes the metric value fed to HPA.

// Paraphrased from kafka_scaler.go — getTotalLag()
for topic, partitionsOffsets := range producerOffsets {
    for partition := range partitionsOffsets {
        lag, lagWithPersistent, err := s.getLagForPartition(
            topic, partition, consumerOffsets, producerOffsets,
        )
        totalLag += lag
        totalLagWithPersistent += lagWithPersistent
    }
}

HPA computes desired replicas as totalLag / lagThreshold. With a lagThreshold of 10 (the default) and a total lag of 50 across all partitions, HPA targets 5 replicas. But this formula has an implicit ceiling: by default, KEDA caps replicas at the partition count (see below). A 10-partition topic with lagThreshold: 50 and a total lag of 1000 still produces only 10 replicas, not 20. The cap only lifts when allowIdleConsumers is true.

Partition-Aware Capping

Kafka consumer groups have a hard constraint: each partition can only be consumed by one consumer in the group. Running more consumers than partitions means idle consumers. KEDA enforces this by default, capping the calculated replica count at the partition count.

A minimal ScaledObject for Kafka:

# kafka-scaledobject.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor
  namespace: default
spec:
  scaleTargetRef:
    name: order-consumer
  pollingInterval: 30
  minReplicaCount: 1
  maxReplicaCount: 10
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka.messaging.svc:9092
      consumerGroup: order-processing-group
      topic: orders
      lagThreshold: "50"
      offsetResetPolicy: latest

Once applied, the ScaledObject creates a managed HPA:

$ kubectl get scaledobject order-processor
NAME              SCALETARGETKIND      SCALETARGETNAME   MIN   MAX   TRIGGERS   READY
order-processor   apps/v1.Deployment   order-consumer    1     10    kafka      True

$ kubectl get hpa keda-hpa-order-processor
NAME                       REFERENCE                  TARGETS        MINPODS   MAXPODS   REPLICAS
keda-hpa-order-processor   Deployment/order-consumer  250/50 (avg)   1         10        5

With a total lag of 250 and a lagThreshold of 50, HPA targets 5 replicas.

Key Tuning Parameters

Several parameters adjust how the scaler counts lag and caps replicas:

allowIdleConsumers (default: false). Set to true when your consumers do work beyond consuming, such as calling external APIs or running batch processing. This lifts the partition cap and lets HPA scale beyond the partition count.

limitToPartitionsWithLag (default: false). Only counts partitions with non-zero lag toward the replica cap. Useful when some partitions are idle. Requires topic to be set. Cannot be combined with allowIdleConsumers.

excludePersistentLag (default: false). Compares the consumer offset to the previous polling cycle. If the offset hasn't moved, KEDA treats that partition's lag as zero for scaling purposes (though it still counts for activation). This prevents scaling up for partitions where the consumer is stuck on a poison pill message.

// Paraphrased from kafka_scaler.go — getLagForPartition(), excludePersistentLag branch
case previousOffset == consumerOffset:
    // Consumer stuck on same offset — don't count for scaling
    return 0, latestOffset - consumerOffset, nil
default:
    // Offset moved — update and count normally
    s.previousOffsets[topic][partitionID] = consumerOffset

ensureEvenDistributionOfPartitions (default: false). Forces the replica count to be a factor of the partition count. A 10-partition topic only scales to 1, 2, 5, or 10 replicas, never 3 or 7. This avoids uneven partition assignment across consumers.

partitionLimitation. Scope scaling to specific partition IDs. Accepts comma-separated values and ranges: 1,2,10-20,31. Only those partitions contribute to the lag calculation.

Activation vs. Scaling Thresholds

KEDA has two distinct thresholds. The activationLagThreshold (default: 0) controls the 0↔1 transition, handled by the KEDA operator. The lagThreshold (default: 10) controls 1↔N scaling, handled by HPA. Activation only fires when the metric value is strictly greater than the activation threshold.

This separation matters. You might want a lagThreshold of 50 for smooth scaling but an activationLagThreshold of 10 to avoid waking the deployment for tiny blips. Or set activationLagThreshold to 100 to keep the deployment at zero until a real backlog forms.

Kafka Offset Reset and Edge Cases

The offsetResetPolicy parameter controls what happens when no offset is committed for the consumer group. New consumer groups always start with no committed offset.

With latest (the default) and scaleToZeroOnInvalidOffset: false (also the default), KEDA returns a lag of 1 when the offset is invalid. That single-unit lag keeps the deployment active so consumers can start, connect to Kafka, and commit their first offset. Without this, a cold-start deployment would never wake up.

With earliest, KEDA returns lag equal to the latest offset (the full topic length). The consumer will replay from the beginning, and KEDA scales proportionally to the backlog.

scaleToZeroOnInvalidOffset (default: false) overrides both behaviors. When true, invalid offsets produce zero lag, allowing scale-to-zero even with no committed offset. Use this only when you explicitly want to ignore topics with no consumer history.

The Dead Consumer Trap

There is an edge case in the Kafka scaler that silently breaks scaling. If your consumer never commits an offset and minReplicaCount is 0, KEDA scales the deployment to zero during quiet periods. Once at zero, KEDA can't discover the topic because there's no commit history and no active consumer. The deployment stays at zero permanently.

# kafka-scaledobject-safe.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor
  namespace: default
spec:
  scaleTargetRef:
    name: order-consumer
  minReplicaCount: 1  # Safety net: always keep one consumer running
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka.messaging.svc:9092
      consumerGroup: order-processing-group
      topic: orders
      lagThreshold: "50"
      offsetResetPolicy: latest
      scaleToZeroOnInvalidOffset: "false"

The fix: set minReplicaCount: 1 until you're confident the consumer group is committing offsets regularly. Alternatively, use a multi-trigger ScaledObject where one trigger explicitly sets the topic to ensure lag is always detected.

The SQS Scaler: Queue Depth as a Scaling Signal

The SQS scaler is simpler than Kafka. There are no partitions, no offsets, no consumer groups. KEDA calls the SQS GetQueueAttributes API and sums the message counts.

The default formula is:

actualMessages = ApproximateNumberOfMessages
               + ApproximateNumberOfMessagesNotVisible

ApproximateNumberOfMessages is the backlog: messages waiting to be picked up. ApproximateNumberOfMessagesNotVisible is messages currently being processed (in-flight). HPA computes desired replicas as actualMessages / queueLength.

// Paraphrased from aws_sqs_queue_scaler.go — parseAwsSqsQueueMetadata()
meta.awsSqsQueueMetricNames = append(meta.awsSqsQueueMetricNames,
    types.QueueAttributeNameApproximateNumberOfMessages)
if meta.ScaleOnInFlight {
    meta.awsSqsQueueMetricNames = append(meta.awsSqsQueueMetricNames,
        types.QueueAttributeNameApproximateNumberOfMessagesNotVisible)
}
if meta.ScaleOnDelayed {
    meta.awsSqsQueueMetricNames = append(meta.awsSqsQueueMetricNames,
        types.QueueAttributeNameApproximateNumberOfMessagesDelayed)
}

The queueLength parameter (default: 5) is the target messages per replica. If one pod processes 10 messages at a time, set queueLength to 10. With 30 messages in the queue, KEDA targets 3 replicas.

SQS ScaledObject Example

# sqs-scaledobject.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: invoice-processor
  namespace: default
spec:
  scaleTargetRef:
    name: invoice-worker
  pollingInterval: 15
  minReplicaCount: 0
  maxReplicaCount: 50
  cooldownPeriod: 60
  triggers:
  - type: aws-sqs-queue
    authenticationRef:
      name: aws-credentials
    metadata:
      queueURL: https://sqs.eu-west-1.amazonaws.com/123456789012/invoices
      queueLength: "10"
      awsRegion: eu-west-1
      scaleOnInFlight: "true"
      activationQueueLength: "5"

In-Flight Messages: The Over-Scaling Trap

The default scaleOnInFlight: true counts messages currently being processed. If your consumers take 5 minutes per message, those messages stay "not visible" for 5 minutes. KEDA sees them as load and keeps scaling up, even though they're already being handled.

For slow consumers, set scaleOnInFlight: false. KEDA will only see the actual backlog (messages waiting to be picked up), which is a more accurate signal for "do I need more workers?"

The trade-off: with scaleOnInFlight: false, a sudden spike can temporarily under-scale. KEDA only sees the queue growing, not the full processing load. For most workloads, this is the safer default.

scaleOnDelayed (default: false) adds ApproximateNumberOfMessagesDelayed to the count. Enable this if you use SQS delay queues and want to pre-scale before delayed messages become visible.

TriggerAuthentication: Securing Scaler Connections

Both scalers need credentials to connect: Kafka needs broker auth, SQS needs AWS credentials. Embedding secrets directly in ScaledObject metadata is fragile and breaks the separation of concerns between teams.

TriggerAuthentication is a namespace-scoped CRD that references Kubernetes Secrets, environment variables, or pod identity providers. ClusterTriggerAuthentication is the cluster-wide variant, accessible from any namespace.

Kafka with SASL/SCRAM and TLS

# kafka-auth.yaml
apiVersion: v1
kind: Secret
metadata:
  name: kafka-credentials
  namespace: default
data:
  username: b3JkZXItcHJvY2Vzc29y    # base64: order-processor
  password: czNjcjN0LXBhc3N3b3Jk    # base64: s3cr3t-password
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: kafka-scram-auth
  namespace: default
spec:
  secretTargetRef:
  - parameter: username
    name: kafka-credentials
    key: username
  - parameter: password
    name: kafka-credentials
    key: password

Reference it from the ScaledObject:

# kafka-scaledobject-with-auth.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor
  namespace: default
spec:
  scaleTargetRef:
    name: order-consumer
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka.messaging.svc:9092
      consumerGroup: order-processing-group
      topic: orders
      lagThreshold: "50"
      sasl: scram_sha512
      tls: enable
    authenticationRef:
      name: kafka-scram-auth

SASL mode goes in the ScaledObject metadata (it's a protocol choice, not a secret). Credentials go in TriggerAuthentication. TLS certificates (ca, cert, key) can also go in the TriggerAuthentication if mutual TLS is required.

SQS with AWS Pod Identity (IRSA)

For AWS workloads, Pod Identity is the cleanest approach. No long-lived credentials, no rotation burden.

# sqs-auth.yaml
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: aws-credentials
  namespace: default
spec:
  podIdentity:
    provider: aws
    roleArn: arn:aws:iam::123456789012:role/keda-sqs-reader

The KEDA operator pod needs the right service account annotations for IRSA. The roleArn scopes the identity to a specific IAM role with SQS read permissions.

For teams sharing credentials across namespaces, use ClusterTriggerAuthentication:

# cluster-auth.yaml
apiVersion: keda.sh/v1alpha1
kind: ClusterTriggerAuthentication
metadata:
  name: shared-aws-credentials
spec:
  podIdentity:
    provider: aws
    roleArn: arn:aws:iam::123456789012:role/keda-shared-reader

Reference it with kind: ClusterTriggerAuthentication in the ScaledObject's authenticationRef.

Production Checklist

Kafka

  • Set minReplicaCount: 1 for new consumer groups until you've confirmed offsets are being committed
  • Enable excludePersistentLag: true to avoid scaling on stuck partitions (poison pills)
  • Use ensureEvenDistributionOfPartitions: true if balanced consumption matters for your workload
  • Match pollingInterval to your lag tolerance. The default 30s works for most cases. Lowering it increases load on the Kafka brokers.
  • Set cooldownPeriod high enough to avoid premature scale-down after burst processing. 60-120 seconds is a reasonable starting point.

SQS

  • Tune queueLength to match your per-pod throughput. If a pod handles 10 messages/second and your target latency is 5 seconds, queueLength: 50 keeps you within budget.
  • Consider scaleOnInFlight: false for slow consumers. It prevents over-provisioning at the cost of slightly delayed scale-out during spikes.
  • Set activationQueueLength > 0 to avoid thrashing around zero. A value of 2-5 messages prevents wake-ups for occasional stragglers.
  • Use cooldownPeriod to prevent premature scale-down after clearing a burst.

Both Scalers

  • Use TriggerAuthentication or ClusterTriggerAuthentication. Never put credentials in ScaledObject metadata.
  • Monitor KEDA operator logs (kubectl logs -n keda deployment/keda-operator) for scaler connection errors. A misconfigured auth doesn't prevent the ScaledObject from being created; it just silently fails to scale.
  • Use paused annotations (autoscaling.keda.sh/paused: "true") during maintenance windows to prevent unwanted scaling during cluster operations.

Wrap-up

Kafka and SQS represent the two dominant queue scaling patterns in KEDA: lag-based and depth-based. The Kafka scaler is the more complex of the two, with partition-aware capping, offset policies, persistent lag exclusion, and even partition distribution. The SQS scaler is straightforward, but the scaleOnInFlight default catches teams off-guard when consumer processing time is high.

The common thread is understanding what KEDA actually measures. The scaler's metric formula determines how many replicas HPA requests. Tuning lagThreshold or queueLength without understanding the formula means guessing. Reading the source code, specifically getLagForPartition() for Kafka and getAwsSqsQueueLength() for SQS, removes the guesswork.

Next in the series: Prometheus-based custom metrics scaling, where the scaling signal isn't a queue at all but any metric you can express as a PromQL query.

KubeDojo
KubeDojo

Mastering the Kubernetes ecosystem — depth-first, no hype.

Subscribe to KubeDojo

Get the latest articles delivered to your inbox.

Related Articles