From Zero to Production with Karpenter

by KubeDojo·May 6, 2026·13 min read·

Your first Karpenter install takes fifteen minutes and works perfectly. Then you promote it to production without resource limits, and Karpenter provisions your AWS account into triple-digit node counts before anyone notices. Or the controller pod lands on a node it manages, consolidation removes that node, and your cluster loses its autoscaler.

The gap between a working demo and a production deployment is where most teams stumble. This article covers the full setup: IAM foundations, Helm installation, NodePool and EC2NodeClass design, and the hardening that prevents these scenarios.

The IAM Foundation

Karpenter needs two IAM roles, each with a distinct purpose.

KarpenterNodeRole is the EC2 instance profile attached to every node Karpenter launches. It needs the standard EKS worker permissions: AmazonEKSWorkerNodePolicy, AmazonEKS_CNI_Policy, AmazonEC2ContainerRegistryPullOnly, and AmazonSSMManagedInstanceCore. This is the role your nodes assume to join the cluster and pull container images.

KarpenterControllerRole is the role the Karpenter controller pod assumes via Pod Identity (or IRSA on older setups) to provision and terminate EC2 instances. This role needs broader, carefully scoped permissions:

The following shows the key permissions from the CloudFormation template's controller policies, simplified into a single block. The full template splits these across five named policies (NodeLifecyclePolicy, IAMIntegrationPolicy, EKSIntegrationPolicy, InterruptionPolicy, ResourceDiscoveryPolicy):

{
  "Statement": [
    {
      "Sid": "Karpenter",
      "Effect": "Allow",
      "Action": [
        "ssm:GetParameter",
        "ec2:DescribeImages",
        "ec2:RunInstances",
        "ec2:DescribeSubnets",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeLaunchTemplates",
        "ec2:DescribeInstances",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeInstanceTypeOfferings",
        "ec2:DeleteLaunchTemplate",
        "ec2:CreateTags",
        "ec2:CreateLaunchTemplate",
        "ec2:CreateFleet",
        "ec2:DescribeSpotPriceHistory",
        "pricing:GetProducts"
      ],
      "Resource": "*"
    },
    {
      "Sid": "ConditionalEC2Termination",
      "Effect": "Allow",
      "Action": "ec2:TerminateInstances",
      "Condition": {
        "StringLike": {
          "ec2:ResourceTag/karpenter.sh/nodepool": "*"
        }
      },
      "Resource": "*"
    },
    {
      "Sid": "PassNodeIAMRole",
      "Effect": "Allow",
      "Action": "iam:PassRole",
      "Resource": "arn:aws:iam::123456789012:role/KarpenterNodeRole-my-cluster"
    }
  ]
}

Two details matter here. First, ec2:TerminateInstances is scoped by the karpenter.sh/nodepool tag. Karpenter can only terminate instances it created. Second, iam:PassRole is restricted to the specific KarpenterNodeRole ARN, preventing the controller from assigning arbitrary roles to EC2 instances.

Resource Discovery via Tags

Karpenter discovers subnets and security groups through tags rather than explicit IDs. Tag your subnets and security groups with karpenter.sh/discovery: <cluster-name>, and Karpenter's EC2NodeClass will find them automatically:

# Tag subnets for Karpenter discovery
$ for SUBNET in $(aws eks describe-nodegroup \
    --cluster-name my-cluster \
    --nodegroup-name my-cluster-ng \
    --query 'nodegroup.subnets' --output text); do
    aws ec2 create-tags \
      --tags "Key=karpenter.sh/discovery,Value=my-cluster" \
      --resources "$SUBNET"
  done

This tag-based approach means you never hardcode subnet IDs in your NodePool manifests. When you add subnets to your VPC, tag them and Karpenter picks them up automatically.

Node Join Authorization

Nodes launched by Karpenter need permission to join the cluster. If you're using the aws-auth ConfigMap (clusters created before EKS access entries), add the KarpenterNodeRole:

# aws-auth ConfigMap entry
- groups:
  - system:bootstrappers
  - system:nodes
  rolearn: arn:aws:iam::123456789012:role/KarpenterNodeRole-my-cluster
  username: system:node:{{EC2PrivateDNSName}}

The aws-auth ConfigMap is the legacy approach. Clusters created with recent EKS versions use EKS access entries instead, which handle node authorization through the AWS API without a ConfigMap. If you're setting up a new cluster, prefer access entries. The getting-started guide's eksctl template configures this via iamIdentityMappings.

The getting-started guide now defaults to Pod Identity (via podIdentityAssociations in eksctl) for the controller role, replacing IRSA. Pod Identity is simpler to configure and doesn't require an OIDC provider. If you're on an existing cluster with IRSA, both work. For new clusters, Pod Identity is the path forward.

Installing Karpenter with Helm

Karpenter ships as an OCI Helm chart hosted on public ECR:

# Install Karpenter v1.10.0
$ helm upgrade --install karpenter \
    oci://public.ecr.aws/karpenter/karpenter \
    --version "1.10.0" \
    --namespace "kube-system" --create-namespace \
    --set "settings.clusterName=my-cluster" \
    --set "settings.interruptionQueue=my-cluster" \
    --set controller.resources.requests.cpu=1 \
    --set controller.resources.requests.memory=1Gi \
    --set controller.resources.limits.cpu=1 \
    --set controller.resources.limits.memory=1Gi \
    --wait

settings.interruptionQueue enables Spot interruption handling via SQS. Without this, Karpenter won't receive advance warning of Spot reclamation, scheduled maintenance, or instance termination events. Set it to the SQS queue name provisioned by the CloudFormation template.

settings.clusterName is required. Karpenter uses it for tag-based resource discovery and to scope its EC2 operations to the correct cluster.

Controller placement is critical. The default values.yaml includes an affinity rule that keeps Karpenter off its own managed nodes:

# values.yaml — controller affinity (default)
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: karpenter.sh/nodepool
              operator: DoesNotExist
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - topologyKey: "kubernetes.io/hostname"

Run the controller on a dedicated managed node group or EKS Fargate profile. If Karpenter runs on a node it manages and decides to consolidate that node, you lose the controller. This is the most common bootstrapping mistake.

The chart also sets priorityClassName: system-cluster-critical, topologySpreadConstraints across zones (maxSkew 1), and a PodDisruptionBudget with maxUnavailable: 1. The default replica count is 2.

tip: Supply chain verification The Helm chart is signed with Cosign. Verify before installing: cosign verify public.ecr.aws/karpenter/karpenter:1.10.0 --certificate-oidc-issuer=https://token.actions.githubusercontent.com --certificate-identity-regexp='https://github\.com/aws/karpenter-provider-aws/'

Designing Your First NodePool and EC2NodeClass

A NodePool defines what Karpenter can provision. An EC2NodeClass defines how: the AWS-specific settings for the instances. Every NodePool references exactly one EC2NodeClass.

Here's the default configuration from the official getting-started guide:

# nodepool-default.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      expireAfter: 720h
  limits:
    cpu: 1000
    memory: 1000Gi
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m

The requirements section is where you express instance flexibility. Broad categories (c, m, r) with a generation floor (Gt: 2) give Karpenter hundreds of instance type options. For Spot workloads, this breadth is essential: the more instance types available, the deeper the capacity pools and the lower the interruption risk.

The limits section caps total provisioned resources. cpu: 1000 and memory: 1000Gi mean Karpenter stops provisioning once either limit is reached across all nodes in this NodePool. Without limits, Karpenter scales unbounded.

The EC2NodeClass configures the AWS-specific details:

# ec2nodeclass-default.yaml
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  role: "KarpenterNodeRole-my-cluster"
  amiSelectorTerms:
    - alias: "al2023@v20260315"
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "my-cluster"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "my-cluster"

amiSelectorTerms with the alias field is the recommended way to select AMIs. The format al2023@v20260315 pins to a specific EKS-optimized AMI version. In development, you might use al2023@latest, but in production, pin the version so AMI updates happen on your schedule, not Amazon's.

kubelet configuration in EC2NodeClass lets you tune node-level settings. For production, consider setting systemReserved and evictionHard thresholds:

# ec2nodeclass-production.yaml (kubelet section)
spec:
  kubelet:
    systemReserved:
      cpu: 100m
      memory: 100Mi
      ephemeral-storage: 1Gi
    evictionHard:
      memory.available: 5%
      nodefs.available: 10%
      nodefs.inodesFree: 10%

Multiple NodePools

Design NodePools to be mutually exclusive. If a pod matches multiple NodePools, Karpenter picks based on weight (higher wins), but overlapping NodePools create unpredictable scheduling.

The most common pattern is to separate GPU workloads from general compute using taints:

# nodepool-gpu.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu
spec:
  template:
    spec:
      requirements:
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["g5.4xlarge", "g5.8xlarge", "p4d.24xlarge"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
      taints:
        - key: nvidia.com/gpu
          value: "true"
          effect: NoSchedule
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  limits:
    nvidia.com/gpu: 8
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized

The taint ensures only pods that tolerate nvidia.com/gpu land on these expensive nodes. The GPU resource limit prevents unbounded GPU scaling.

Production Hardening

High Availability

The Helm chart defaults cover this well: 2 replicas, podAntiAffinity by hostname, topologySpreadConstraints across zones. Verify these are in place and not overridden.

Interruption Handling

Set settings.interruptionQueue to enable SQS-based interruption handling. This gives Karpenter advance notice of Spot interruptions (2-minute warning), scheduled maintenance events, and instance termination events. Without it, Karpenter reacts only after the instance is gone.

The CloudFormation template in the getting-started guide provisions the SQS queue and EventBridge rules automatically. If you're setting up manually, you need both the queue and the EventBridge rules that forward EC2 events to it.

warning: Do not run both Karpenter interruption handling and AWS Node Termination Handler. They conflict, and events get processed twice, leading to race conditions during node draining.

Disruption Budgets

The default disruption budget is nodes: 10%. For production, add schedule-based budgets to prevent disruptions during business hours or deployment windows:

# Disruption budget example
spec:
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m
    budgets:
      - nodes: "20%"
      - schedule: "0 9 * * mon-fri"
        duration: 8h
        nodes: "0"

The second budget blocks all voluntary disruptions from 9 AM to 5 PM UTC on weekdays. Karpenter still handles involuntary events (Spot interruptions, expiration) during this window.

Protecting Critical Workloads

For long-running jobs, stateful workloads, or anything that shouldn't be interrupted by consolidation, use the karpenter.sh/do-not-disrupt: "true" annotation:

# Protect a long-running batch job
spec:
  template:
    metadata:
      annotations:
        karpenter.sh/do-not-disrupt: "true"

This prevents Karpenter from voluntarily evicting the pod for consolidation or drift. The node won't be considered for voluntary disruption as long as this pod runs on it.

API Server Throttling

Karpenter installed in kube-system benefits from the system-leader-election and kube-system-service-accounts FlowSchemas, which map requests to the leader-election and workload-high PriorityLevelConfigurations respectively. If you install Karpenter in a different namespace, create custom FlowSchemas to ensure it gets the same priority and isn't throttled by other workloads.

Private Cluster Considerations

Running Karpenter in a private cluster (no internet egress) requires VPC endpoints for: EC2, ECR (API + DKR), S3, STS, SSM, SQS, and EKS. Set settings.isolatedVPC: true in the Helm values.

There's no VPC endpoint for the AWS Price List API. Karpenter bundles on-demand pricing data in its binary and updates it with each release. In isolated clusters, pricing data goes stale between upgrades. This affects cost-optimal instance selection but doesn't break provisioning.

Gotchas

The bootstrapping deadlock. Don't run Karpenter on nodes it manages. If the controller pod lands on a Karpenter-managed node and consolidation removes that node, your cluster loses its autoscaler. Use a dedicated managed node group or Fargate.

Spot service-linked role. Before Spot capacity works, the AWSServiceRoleForEC2Spot role must exist in your account. Run aws iam create-service-linked-role --aws-service-name spot.amazonaws.com once per account. You'll get an error if it already exists. That's fine.

NodePool overlap. If a pod matches multiple NodePools without explicit weights, Karpenter picks randomly. This leads to inconsistent instance selection and confusing cost allocation. Design NodePools to be mutually exclusive via taints, labels, or non-overlapping requirements.

The 63-character label limit. Kubernetes labels max at 63 characters. Salesforce hit this migrating 1,000+ clusters when legacy node group names exceeded the limit. Keep NodePool names short and predictable.

Forgetting spec.limits. Without CPU or memory limits on your NodePool, Karpenter provisions nodes as long as pods are pending. You'll discover this when the AWS bill arrives. Every NodePool needs explicit resource limits.

consolidateAfter: Never disables consolidation, not all disruption. Setting it to Never stops Karpenter from consolidating empty or underutilized nodes. But drift (a separate disruption method) still runs: if your NodePool or EC2NodeClass changes, Karpenter will replace drifted nodes regardless. If you want no voluntary disruption at all, set a budget of nodes: "0".

Wrap-up

A production Karpenter deployment rests on four pillars: a properly scoped IAM foundation, a Helm installation that keeps the controller off managed nodes, NodePools designed with explicit limits and mutually exclusive requirements, and operational hardening through interruption handling, disruption budgets, and AMI pinning.

Once Karpenter is running, monitor its behavior closely for the first week. Watch the controller logs for consolidation decisions, verify that disruption budgets hold during business hours, and confirm that your resource limits are calibrated to your actual cluster ceiling. The configuration described here is a starting point. Tune consolidateAfter, disruption budgets, and instance requirements as you observe real workload patterns.

karpenter eks production setup helm

This post is part of the Karpenter — Kubernetes Node Autoscaling from Setup to Optimization collection (2 of 6)

← What Is Karpenter and Why It Replaced Cluster Autoscaler Disruption, Drift, and Consolidation — Karpenter's Node Lifecycle →

KubeDojo

Mastering the Kubernetes ecosystem — depth-first, no hype.

Subscribe to KubeDojo

Get the latest articles delivered to your inbox.

keda karpenter autoscaling

KEDA and Karpenter Together — Pod and Node Scaling Synergy

Combining KEDA's event-driven pod scaling with Karpenter's just-in-time node provisioning for a fully reactive, cost-efficient Kubernetes autoscaling stack.

by KubeDojo

karpenter gpu ai-ml

GPU and AI/ML Workload Scaling with Karpenter

Dedicated GPU NodePools, cold start fixes for 10GB+ AI images, disruption protection for training jobs, and gang scheduling for distributed workloads.

by KubeDojo