Skip to content

VPC Integration Guide

This guide focuses on using Karpenter IBM Cloud Provider with self-managed Kubernetes clusters running on IBM Cloud VPC infrastructure.

Overview

VPC integration provides flexible node provisioning for self-managed Kubernetes clusters with full control over cluster configuration and automatic bootstrap capabilities.

Key Benefits

  • Automatic Bootstrap: Zero-configuration node joining with intelligent cluster discovery
  • Dynamic Instance Selection: Full flexibility in instance type selection based on workload requirements
  • Custom Configurations: Support for specialized setups (GPU, HPC, security hardening)

Prerequisites

Infrastructure Requirements

  • Self-Managed Kubernetes: Running on IBM Cloud VPC instances
  • VPC Infrastructure: VPC with subnets, security groups, and network configuration
  • API Access: Service ID with VPC Infrastructure Services permissions
  • Network Connectivity: Proper security groups allowing cluster communication

Required Information

Gather the following before starting:

# List your VPCs
ibmcloud is vpcs --output json

# List subnets in your VPC
ibmcloud is subnets --vpc <vpc-id> --output json

# List security groups
ibmcloud is security-groups --vpc <vpc-id> --output json

# List available images
ibmcloud is images --visibility public --status available | grep ubuntu

Quick Setup

Step 1: Install Karpenter

# Create namespace and secrets
kubectl create namespace karpenter

kubectl create secret generic karpenter-ibm-credentials \
  --from-literal=api-key="your-general-api-key" \
  --from-literal=vpc-api-key="your-vpc-api-key" \
  --namespace karpenter

# Install via Helm
helm repo add karpenter-ibm https://pfeifferj.github.io/karpenter-provider-ibm-cloud
helm repo update
helm install karpenter karpenter-ibm/karpenter-ibm \
  --namespace karpenter \
  --create-namespace \
  --set controller.env.IBM_REGION="us-south"

Step 2: Create VPC NodeClass

apiVersion: karpenter.ibm.sh/v1alpha1
kind: IBMNodeClass
metadata:
  name: vpc-nodeclass
  annotations:
    karpenter.ibm.sh/description: "VPC self-managed cluster NodeClass"
spec:
  # REQUIRED: Replace with your actual values
  region: us-south                      # Your IBM Cloud region
  zone: us-south-1                      # Target availability zone
  vpc: vpc-12345678                     # Your VPC ID
  image: r006-12345678                  # Ubuntu 20.04 LTS recommended

  # Security and networking
  securityGroups:
  - sg-k8s-workers                      # Security group allowing cluster communication

  # Optional: Specific subnet (auto-selected if not specified)
  subnet: subnet-12345678               # Your subnet ID

  # Optional: Instance requirements (alternative to specific instance profile)
  instanceRequirements:
    architecture: amd64                 # CPU architecture: amd64, arm64, s390x
    minimumCPU: 2                       # Minimum vCPUs required
    minimumMemory: 4                    # Minimum memory in GiB
    maximumHourlyPrice: "1.00"          # Maximum hourly price in USD

  # Optional: Specific instance profile (alternative to instanceRequirements)
  instanceProfile: bx2-4x16             # Specific instance type

  # Optional: Placement strategy for zone/subnet selection
  placementStrategy:
    zoneBalance: Balanced               # Balanced, AvailabilityFirst, or CostOptimized

  # Optional: SSH access for troubleshooting
  # To find SSH key IDs: ibmcloud is keys --output json | jq '.[] | {name, id}'
  sshKeys:
  - r010-12345678-1234-1234-1234-123456789012  # SSH key ID

  # Optional: Resource group ID
  resourceGroup: rg-12345678             # Resource group ID

  # Optional: Placement target (dedicated host or placement group)
  placementTarget: ph-12345678

  # Optional: Tags to apply to instances
  tags:
    environment: production
    team: devops

  # Optional: Bootstrap mode (cloud-init, iks-api, or auto)
  bootstrapMode: cloud-init

  # REQUIRED: Internal API server endpoint (find with: kubectl get endpointslice -n default -l kubernetes.io/service-name=kubernetes)
  apiServerEndpoint: "https://<INTERNAL-API-SERVER-IP>:6443"

  # Optional: IKS cluster ID (required when bootstrapMode is "iks-api")
  iksClusterID: bng6n48d0t6vj7b33kag

  # Optional: IKS worker pool ID (for IKS API bootstrapping)
  iksWorkerPoolID: bng6n48d0t6vj7b33kag-pool1

  # Optional: Load balancer integration
  loadBalancerIntegration:
    enabled: true
    targetGroups:
    - loadBalancerID: r010-12345678-1234-5678-9abc-def012345678
      poolName: web-servers
      port: 80
      weight: 50
    autoDeregister: true
    registrationTimeout: 300

  # VPC mode uses automatic bootstrap - no userData required!

Step 3: Create NodePool

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: vpc-nodepool
spec:
  template:
    metadata:
      labels:
        provisioner: karpenter-vpc
        cluster-type: self-managed
    spec:
      nodeClassRef:
        apiVersion: karpenter.ibm.sh/v1alpha1
        kind: IBMNodeClass
        name: vpc-nodeclass

      # Full flexibility in instance requirements
      requirements:
      - key: node.kubernetes.io/instance-type
        operator: In
        values: ["bx2-2x8", "bx2-4x16", "cx2-2x4", "cx2-4x8", "mx2-2x16"]
      - key: kubernetes.io/arch
        operator: In
        values: ["amd64"]
      - key: karpenter.sh/capacity-type
        operator: In
        values: ["on-demand"]

  limits:
    cpu: 1000
    memory: 1000Gi

  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 30s

VPC Bootstrap Features

API Endpoint Discovery (Critical for VPC Clusters)

Important: VPC clusters must use the internal API endpoint, not the external kubectl endpoint.

Finding the Correct Internal API Endpoint

# Method 1: Get actual API server endpoint (RECOMMENDED)
kubectl get endpointslice -n default -l kubernetes.io/service-name=kubernetes

# Example output:
# NAME         ADDRESSTYPE   PORTS   ENDPOINTS       AGE  
# kubernetes   IPv4          6443    <INTERNAL-IP>   15d

# Use: https://<INTERNAL-IP>:6443
# Method 2: Check kubernetes service (alternative)
kubectl get svc kubernetes -o jsonpath='{.spec.clusterIP}'
# Returns cluster IP (e.g., <CLUSTER-IP>) - use https://<CLUSTER-IP>:443

Configuring IBMNodeClass with Correct Endpoint

apiVersion: karpenter.ibm.sh/v1alpha1
kind: IBMNodeClass
metadata:
  name: vpc-nodeclass
spec:
  # CRITICAL: Use INTERNAL endpoint from discovery above
  apiServerEndpoint: "https://<INTERNAL-IP>:6443"

  region: us-south
  vpc: vpc-12345678
  # ... rest of config

Automatic Cluster Discovery

The VPC integration automatically discovers your cluster configuration:

  • API Endpoint: Uses the internal cluster API server endpoint you configure
  • CA Certificate: Extracts cluster CA certificate from existing nodes
  • DNS Configuration: Discovers cluster DNS service IP and search domains
  • Network Settings: Detects cluster pod and service CIDR ranges
  • Runtime Detection: Matches container runtime used by existing nodes

Zero Configuration Bootstrap

# Minimal configuration - everything else is automatic
apiVersion: karpenter.ibm.sh/v1alpha1
kind: IBMNodeClass
metadata:
  name: minimal-vpc
spec:
  region: us-south
  zone: us-south-1
  vpc: vpc-12345678
  image: r006-ubuntu-20-04
  securityGroups:
  - sg-default
  # No userData needed - bootstrap is fully automatic!

Advanced VPC Configurations

Multi-Zone VPC Setup

# Zone 1
---
apiVersion: karpenter.ibm.sh/v1alpha1
kind: IBMNodeClass
metadata:
  name: vpc-us-south-1
spec:
  region: us-south
  zone: us-south-1
  vpc: vpc-12345678
  subnet: subnet-zone1-12345
  image: r006-ubuntu-20-04
  securityGroups:
  - sg-k8s-workers
---
# Zone 2
apiVersion: karpenter.ibm.sh/v1alpha1
kind: IBMNodeClass
metadata:
  name: vpc-us-south-2
spec:
  region: us-south
  zone: us-south-2
  vpc: vpc-12345678
  subnet: subnet-zone2-12345
  image: r006-ubuntu-20-04
  securityGroups:
  - sg-k8s-workers

GPU Workloads

apiVersion: karpenter.ibm.sh/v1alpha1
kind: IBMNodeClass
metadata:
  name: vpc-gpu
spec:
  region: us-south
  zone: us-south-1
  vpc: vpc-gpu-12345
  image: r006-ubuntu-20-04
  instanceProfile: gx2-8x64x1v100      # GPU instance type
  securityGroups:
  - sg-gpu-workloads
  userData: |
    #!/bin/bash
    # GPU drivers and configuration
    apt-get update
    apt-get install -y nvidia-driver-470 nvidia-container-toolkit

    # Configure containerd for GPU support
    mkdir -p /etc/containerd
    cat > /etc/containerd/config.toml <<EOF
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
      runtime_type = "io.containerd.runc.v2"
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
        BinaryName = "/usr/bin/nvidia-container-runtime"
    EOF

    # Bootstrap script automatically appended

High-Performance Computing (HPC)

apiVersion: karpenter.ibm.sh/v1alpha1
kind: IBMNodeClass
metadata:
  name: vpc-hpc
spec:
  region: us-south
  zone: us-south-1
  vpc: vpc-hpc-12345
  image: r006-ubuntu-20-04
  instanceProfile: cx2-32x64           # High-performance instance
  securityGroups:
  - sg-hpc-cluster
  userData: |
    #!/bin/bash
    # HPC optimizations
    echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

    # Memory optimizations
    echo 'vm.swappiness = 1' >> /etc/sysctl.conf
    echo 'vm.dirty_ratio = 15' >> /etc/sysctl.conf

    # Install HPC libraries
    apt-get update && apt-get install -y \
      openmpi-bin openmpi-common libopenmpi-dev \
      libblas3 liblapack3

    # Network optimizations for high-throughput
    echo 'net.core.rmem_max = 134217728' >> /etc/sysctl.conf
    echo 'net.core.wmem_max = 134217728' >> /etc/sysctl.conf

Custom CNI Configuration

apiVersion: karpenter.ibm.sh/v1alpha1
kind: IBMNodeClass
metadata:
  name: vpc-custom-cni
spec:
  region: us-south
  zone: us-south-1
  vpc: vpc-custom-12345
  image: r006-ubuntu-20-04
  userData: |
    #!/bin/bash
    # Custom CNI setup before cluster join

    # Install Cilium CNI
    curl -L -o /opt/cni/bin/cilium-cni \
      https://github.com/cilium/cilium/releases/download/v1.14.0/cilium-linux-amd64.tar.gz

    # Custom CNI configuration
    mkdir -p /etc/cni/net.d
    cat > /etc/cni/net.d/05-cilium.conf <<EOF
    {
      "cniVersion": "0.4.0",
      "name": "cilium",
      "type": "cilium-cni",
      "enable-debug": false
    }
    EOF

    # Bootstrap script handles the rest

Dynamic Instance Selection

Unlike IKS mode, VPC integration provides full flexibility in instance type selection:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: flexible-nodepool
spec:
  template:
    spec:
      nodeClassRef:
        name: vpc-nodeclass
      requirements:
      # Karpenter will choose the best instance type based on pod requirements
      - key: node.kubernetes.io/instance-type
        operator: In
        values: [
          "bx2-2x8", "bx2-4x16", "bx2-8x32",    # Balanced instances
          "cx2-2x4", "cx2-4x8", "cx2-8x16",     # Compute optimized
          "mx2-2x16", "mx2-4x32", "mx2-8x64"    # Memory optimized
        ]
      - key: karpenter.ibm.sh/instance-family
        operator: In
        values: ["bx2", "cx2", "mx2"]
      # Resource-based selection
      - key: kubernetes.io/arch
        operator: In
        values: ["amd64"]
      - key: karpenter.sh/capacity-type
        operator: In
        values: ["on-demand"]

VPC-Specific Troubleshooting

Bootstrap Issues

Wrong API Endpoint Configuration

Symptoms: - NodeClaims created but nodes never register with cluster - Kubelet logs show: "Client.Timeout exceeded while awaiting headers" - Node status remains "Unknown" with "Drifted" = True

Solution:

# 1. Find correct internal endpoint
kubectl get endpointslice -n default -l kubernetes.io/service-name=kubernetes

# 2. Update NodeClass with internal endpoint (NOT external)
kubectl patch ibmnodeclass your-nodeclass --type='merge' \
  -p='{"spec":{"apiServerEndpoint":"https://<INTERNAL-IP>:6443"}}'

# 3. Delete old NodeClaims to trigger new ones with correct config
kubectl delete nodeclaims --all

# 4. Monitor node registration
kubectl get nodes -w

Verification:

# Test connectivity from worker instance
ssh ubuntu@<node-ip> "telnet <INTERNAL-IP> 6443"
# Should connect successfully, not timeout

# Check kubelet logs
ssh ubuntu@<node-ip> "sudo journalctl -u kubelet -f"
# Should see successful API server communication

Cluster Discovery Failures

# Check if controller can reach cluster API
kubectl logs -n karpenter deployment/karpenter | grep "cluster discovery"

# Verify internal API endpoint is accessible
ssh ubuntu@<node-ip> "curl -k https://INTERNAL_API_ENDPOINT/healthz"

# Check security group rules
ibmcloud is security-group <sg-id> --output json | jq '.rules'

Bootstrap Script Problems

# View generated bootstrap script
ssh ubuntu@<node-ip> "sudo cat /var/lib/cloud/instance/scripts/*"

# Check cloud-init logs
ssh ubuntu@<node-ip> "sudo journalctl -u cloud-final"

# Monitor bootstrap execution
ssh ubuntu@<node-ip> "sudo tail -f /var/log/cloud-init-output.log"

Network Connectivity

# Test cluster communication
ssh ubuntu@<node-ip> "nc -zv CLUSTER_ENDPOINT 6443"

# Check DNS resolution
ssh ubuntu@<node-ip> "nslookup kubernetes.default.svc.cluster.local"

# Verify route table
ibmcloud is vpc-routes <vpc-id>

Instance Provisioning

# Check available instances in zone
ibmcloud is instance-profiles --output json | jq '.[] | select(.family=="bx2")'

# Monitor quota usage
ibmcloud is instances --output json | jq 'length'

# Check subnet capacity
ibmcloud is subnet <subnet-id> --output json | jq '.available_ipv4_address_count'

CNI Initialization Timing Issues

Problem: Newly provisioned nodes may be terminated prematurely before CNI (Calico) fully initializes.

Symptoms: - Nodes created successfully but pods fail with CNI errors - Failed to create pod sandbox: plugin type="calico" failed (add): stat /var/lib/calico/nodename: no such file or directory - Nodes get terminated and recreated in a loop

Root Cause: Karpenter's consolidateAfter setting is too aggressive, terminating nodes before CNI initialization completes.

Solution:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: vpc-nodepool
spec:
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 300s  # Wait 5 minutes for CNI initialization

Verification:

# Check if Calico pods are running on new nodes
kubectl get pods -n kube-system -o wide | grep calico-node

# Monitor CNI status on a node
ssh ubuntu@<node-ip> "sudo ls -la /var/lib/calico/"

# Check for CNI-related events
kubectl get events --field-selector involvedObject.kind=Pod | grep -i sandbox