VPC Integration Guide¶
This guide focuses on using Karpenter IBM Cloud Provider with self-managed Kubernetes clusters running on IBM Cloud VPC infrastructure.
Overview¶
VPC integration provides flexible node provisioning for self-managed Kubernetes clusters with full control over cluster configuration and automatic bootstrap capabilities.
Key Benefits¶
- Automatic Bootstrap: Zero-configuration node joining with intelligent cluster discovery
- Dynamic Instance Selection: Full flexibility in instance type selection based on workload requirements
- Custom Configurations: Support for specialized setups (GPU, HPC, security hardening)
Prerequisites¶
Infrastructure Requirements¶
- Self-Managed Kubernetes: Running on IBM Cloud VPC instances
- VPC Infrastructure: VPC with subnets, security groups, and network configuration
- API Access: Service ID with VPC Infrastructure Services permissions
- Network Connectivity: Proper security groups allowing cluster communication
Required Information¶
Gather the following before starting:
# List your VPCs
ibmcloud is vpcs --output json
# List subnets in your VPC
ibmcloud is subnets --vpc <vpc-id> --output json
# List security groups
ibmcloud is security-groups --vpc <vpc-id> --output json
# List available images
ibmcloud is images --visibility public --status available | grep ubuntu
Quick Setup¶
Step 1: Install Karpenter¶
# Create namespace and secrets
kubectl create namespace karpenter
kubectl create secret generic karpenter-ibm-credentials \
--from-literal=api-key="your-general-api-key" \
--from-literal=vpc-api-key="your-vpc-api-key" \
--namespace karpenter
# Install via Helm
helm repo add karpenter-ibm https://pfeifferj.github.io/karpenter-provider-ibm-cloud
helm repo update
helm install karpenter karpenter-ibm/karpenter-ibm \
--namespace karpenter \
--create-namespace \
--set controller.env.IBM_REGION="us-south"
Step 2: Create VPC NodeClass¶
apiVersion: karpenter.ibm.sh/v1alpha1
kind: IBMNodeClass
metadata:
name: vpc-nodeclass
annotations:
karpenter.ibm.sh/description: "VPC self-managed cluster NodeClass"
spec:
# REQUIRED: Replace with your actual values
region: us-south # Your IBM Cloud region
zone: us-south-1 # Target availability zone
vpc: vpc-12345678 # Your VPC ID
image: r006-12345678 # Ubuntu 20.04 LTS recommended
# Security and networking
securityGroups:
- sg-k8s-workers # Security group allowing cluster communication
# Optional: Specific subnet (auto-selected if not specified)
subnet: subnet-12345678 # Your subnet ID
# Optional: Instance requirements (alternative to specific instance profile)
instanceRequirements:
architecture: amd64 # CPU architecture: amd64, arm64, s390x
minimumCPU: 2 # Minimum vCPUs required
minimumMemory: 4 # Minimum memory in GiB
maximumHourlyPrice: "1.00" # Maximum hourly price in USD
# Optional: Specific instance profile (alternative to instanceRequirements)
instanceProfile: bx2-4x16 # Specific instance type
# Optional: Placement strategy for zone/subnet selection
placementStrategy:
zoneBalance: Balanced # Balanced, AvailabilityFirst, or CostOptimized
# Optional: SSH access for troubleshooting
# To find SSH key IDs: ibmcloud is keys --output json | jq '.[] | {name, id}'
sshKeys:
- r010-12345678-1234-1234-1234-123456789012 # SSH key ID
# Optional: Resource group ID
resourceGroup: rg-12345678 # Resource group ID
# Optional: Placement target (dedicated host or placement group)
placementTarget: ph-12345678
# Optional: Tags to apply to instances
tags:
environment: production
team: devops
# Optional: Bootstrap mode (cloud-init, iks-api, or auto)
bootstrapMode: cloud-init
# REQUIRED: Internal API server endpoint (find with: kubectl get endpointslice -n default -l kubernetes.io/service-name=kubernetes)
apiServerEndpoint: "https://<INTERNAL-API-SERVER-IP>:6443"
# Optional: IKS cluster ID (required when bootstrapMode is "iks-api")
iksClusterID: bng6n48d0t6vj7b33kag
# Optional: IKS worker pool ID (for IKS API bootstrapping)
iksWorkerPoolID: bng6n48d0t6vj7b33kag-pool1
# Optional: Load balancer integration
loadBalancerIntegration:
enabled: true
targetGroups:
- loadBalancerID: r010-12345678-1234-5678-9abc-def012345678
poolName: web-servers
port: 80
weight: 50
autoDeregister: true
registrationTimeout: 300
# VPC mode uses automatic bootstrap - no userData required!
Step 3: Create NodePool¶
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: vpc-nodepool
spec:
template:
metadata:
labels:
provisioner: karpenter-vpc
cluster-type: self-managed
spec:
nodeClassRef:
apiVersion: karpenter.ibm.sh/v1alpha1
kind: IBMNodeClass
name: vpc-nodeclass
# Full flexibility in instance requirements
requirements:
- key: node.kubernetes.io/instance-type
operator: In
values: ["bx2-2x8", "bx2-4x16", "cx2-2x4", "cx2-4x8", "mx2-2x16"]
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
limits:
cpu: 1000
memory: 1000Gi
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 30s
VPC Bootstrap Features¶
API Endpoint Discovery (Critical for VPC Clusters)¶
Important: VPC clusters must use the internal API endpoint, not the external kubectl endpoint.
Finding the Correct Internal API Endpoint¶
# Method 1: Get actual API server endpoint (RECOMMENDED)
kubectl get endpointslice -n default -l kubernetes.io/service-name=kubernetes
# Example output:
# NAME ADDRESSTYPE PORTS ENDPOINTS AGE
# kubernetes IPv4 6443 <INTERNAL-IP> 15d
# Use: https://<INTERNAL-IP>:6443
# Method 2: Check kubernetes service (alternative)
kubectl get svc kubernetes -o jsonpath='{.spec.clusterIP}'
# Returns cluster IP (e.g., <CLUSTER-IP>) - use https://<CLUSTER-IP>:443
Configuring IBMNodeClass with Correct Endpoint¶
apiVersion: karpenter.ibm.sh/v1alpha1
kind: IBMNodeClass
metadata:
name: vpc-nodeclass
spec:
# CRITICAL: Use INTERNAL endpoint from discovery above
apiServerEndpoint: "https://<INTERNAL-IP>:6443"
region: us-south
vpc: vpc-12345678
# ... rest of config
Automatic Cluster Discovery¶
The VPC integration automatically discovers your cluster configuration:
- API Endpoint: Uses the internal cluster API server endpoint you configure
- CA Certificate: Extracts cluster CA certificate from existing nodes
- DNS Configuration: Discovers cluster DNS service IP and search domains
- Network Settings: Detects cluster pod and service CIDR ranges
- Runtime Detection: Matches container runtime used by existing nodes
Zero Configuration Bootstrap¶
# Minimal configuration - everything else is automatic
apiVersion: karpenter.ibm.sh/v1alpha1
kind: IBMNodeClass
metadata:
name: minimal-vpc
spec:
region: us-south
zone: us-south-1
vpc: vpc-12345678
image: r006-ubuntu-20-04
securityGroups:
- sg-default
# No userData needed - bootstrap is fully automatic!
Advanced VPC Configurations¶
Multi-Zone VPC Setup¶
# Zone 1
---
apiVersion: karpenter.ibm.sh/v1alpha1
kind: IBMNodeClass
metadata:
name: vpc-us-south-1
spec:
region: us-south
zone: us-south-1
vpc: vpc-12345678
subnet: subnet-zone1-12345
image: r006-ubuntu-20-04
securityGroups:
- sg-k8s-workers
---
# Zone 2
apiVersion: karpenter.ibm.sh/v1alpha1
kind: IBMNodeClass
metadata:
name: vpc-us-south-2
spec:
region: us-south
zone: us-south-2
vpc: vpc-12345678
subnet: subnet-zone2-12345
image: r006-ubuntu-20-04
securityGroups:
- sg-k8s-workers
GPU Workloads¶
apiVersion: karpenter.ibm.sh/v1alpha1
kind: IBMNodeClass
metadata:
name: vpc-gpu
spec:
region: us-south
zone: us-south-1
vpc: vpc-gpu-12345
image: r006-ubuntu-20-04
instanceProfile: gx2-8x64x1v100 # GPU instance type
securityGroups:
- sg-gpu-workloads
userData: |
#!/bin/bash
# GPU drivers and configuration
apt-get update
apt-get install -y nvidia-driver-470 nvidia-container-toolkit
# Configure containerd for GPU support
mkdir -p /etc/containerd
cat > /etc/containerd/config.toml <<EOF
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
EOF
# Bootstrap script automatically appended
High-Performance Computing (HPC)¶
apiVersion: karpenter.ibm.sh/v1alpha1
kind: IBMNodeClass
metadata:
name: vpc-hpc
spec:
region: us-south
zone: us-south-1
vpc: vpc-hpc-12345
image: r006-ubuntu-20-04
instanceProfile: cx2-32x64 # High-performance instance
securityGroups:
- sg-hpc-cluster
userData: |
#!/bin/bash
# HPC optimizations
echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# Memory optimizations
echo 'vm.swappiness = 1' >> /etc/sysctl.conf
echo 'vm.dirty_ratio = 15' >> /etc/sysctl.conf
# Install HPC libraries
apt-get update && apt-get install -y \
openmpi-bin openmpi-common libopenmpi-dev \
libblas3 liblapack3
# Network optimizations for high-throughput
echo 'net.core.rmem_max = 134217728' >> /etc/sysctl.conf
echo 'net.core.wmem_max = 134217728' >> /etc/sysctl.conf
Custom CNI Configuration¶
apiVersion: karpenter.ibm.sh/v1alpha1
kind: IBMNodeClass
metadata:
name: vpc-custom-cni
spec:
region: us-south
zone: us-south-1
vpc: vpc-custom-12345
image: r006-ubuntu-20-04
userData: |
#!/bin/bash
# Custom CNI setup before cluster join
# Install Cilium CNI
curl -L -o /opt/cni/bin/cilium-cni \
https://github.com/cilium/cilium/releases/download/v1.14.0/cilium-linux-amd64.tar.gz
# Custom CNI configuration
mkdir -p /etc/cni/net.d
cat > /etc/cni/net.d/05-cilium.conf <<EOF
{
"cniVersion": "0.4.0",
"name": "cilium",
"type": "cilium-cni",
"enable-debug": false
}
EOF
# Bootstrap script handles the rest
Dynamic Instance Selection¶
Unlike IKS mode, VPC integration provides full flexibility in instance type selection:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: flexible-nodepool
spec:
template:
spec:
nodeClassRef:
name: vpc-nodeclass
requirements:
# Karpenter will choose the best instance type based on pod requirements
- key: node.kubernetes.io/instance-type
operator: In
values: [
"bx2-2x8", "bx2-4x16", "bx2-8x32", # Balanced instances
"cx2-2x4", "cx2-4x8", "cx2-8x16", # Compute optimized
"mx2-2x16", "mx2-4x32", "mx2-8x64" # Memory optimized
]
- key: karpenter.ibm.sh/instance-family
operator: In
values: ["bx2", "cx2", "mx2"]
# Resource-based selection
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
VPC-Specific Troubleshooting¶
Bootstrap Issues¶
Wrong API Endpoint Configuration¶
Symptoms: - NodeClaims created but nodes never register with cluster - Kubelet logs show: "Client.Timeout exceeded while awaiting headers"
- Node status remains "Unknown" with "Drifted" = True
Solution:
# 1. Find correct internal endpoint
kubectl get endpointslice -n default -l kubernetes.io/service-name=kubernetes
# 2. Update NodeClass with internal endpoint (NOT external)
kubectl patch ibmnodeclass your-nodeclass --type='merge' \
-p='{"spec":{"apiServerEndpoint":"https://<INTERNAL-IP>:6443"}}'
# 3. Delete old NodeClaims to trigger new ones with correct config
kubectl delete nodeclaims --all
# 4. Monitor node registration
kubectl get nodes -w
Verification:
# Test connectivity from worker instance
ssh ubuntu@<node-ip> "telnet <INTERNAL-IP> 6443"
# Should connect successfully, not timeout
# Check kubelet logs
ssh ubuntu@<node-ip> "sudo journalctl -u kubelet -f"
# Should see successful API server communication
Cluster Discovery Failures¶
# Check if controller can reach cluster API
kubectl logs -n karpenter deployment/karpenter | grep "cluster discovery"
# Verify internal API endpoint is accessible
ssh ubuntu@<node-ip> "curl -k https://INTERNAL_API_ENDPOINT/healthz"
# Check security group rules
ibmcloud is security-group <sg-id> --output json | jq '.rules'
Bootstrap Script Problems¶
# View generated bootstrap script
ssh ubuntu@<node-ip> "sudo cat /var/lib/cloud/instance/scripts/*"
# Check cloud-init logs
ssh ubuntu@<node-ip> "sudo journalctl -u cloud-final"
# Monitor bootstrap execution
ssh ubuntu@<node-ip> "sudo tail -f /var/log/cloud-init-output.log"
Network Connectivity¶
# Test cluster communication
ssh ubuntu@<node-ip> "nc -zv CLUSTER_ENDPOINT 6443"
# Check DNS resolution
ssh ubuntu@<node-ip> "nslookup kubernetes.default.svc.cluster.local"
# Verify route table
ibmcloud is vpc-routes <vpc-id>
Instance Provisioning¶
# Check available instances in zone
ibmcloud is instance-profiles --output json | jq '.[] | select(.family=="bx2")'
# Monitor quota usage
ibmcloud is instances --output json | jq 'length'
# Check subnet capacity
ibmcloud is subnet <subnet-id> --output json | jq '.available_ipv4_address_count'
CNI Initialization Timing Issues¶
Problem: Newly provisioned nodes may be terminated prematurely before CNI (Calico) fully initializes.
Symptoms: - Nodes created successfully but pods fail with CNI errors - Failed to create pod sandbox: plugin type="calico" failed (add): stat /var/lib/calico/nodename: no such file or directory
- Nodes get terminated and recreated in a loop
Root Cause: Karpenter's consolidateAfter
setting is too aggressive, terminating nodes before CNI initialization completes.
Solution:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: vpc-nodepool
spec:
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 300s # Wait 5 minutes for CNI initialization
Verification: