Troubleshooting Guide¶
This guide helps diagnose and resolve common issues with the Karpenter IBM Cloud Provider.
Quick Diagnostics¶
Check Controller Status¶
# Check if Karpenter controller is running
kubectl get pods -n karpenter
# Check controller logs
kubectl logs -n karpenter deployment/karpenter -f
# Check controller startup messages
kubectl logs -n karpenter deployment/karpenter | grep "Starting Controller"
Common Issues¶
Authentication Issues¶
Failed to authenticate with IBM Cloud API
Failed to create VPC client: authentication failed
Error: {"errorMessage":"Unauthorized","errorCode":"401"}
Solution:
- Verify API keys are correctly set
- Check Service ID permissions
- Update Kubernetes secret:
Instance Provisioning Issues¶
No suitable subnets found
Diagnosis:
# Check available subnets
ibmcloud is subnets --output json
# Check subnet capacity
ibmcloud is subnet SUBNET_ID --output json
Solutions:
- Verify subnet exists in specified zone
- Ensure subnet has available IP addresses
- Consider using auto-subnet selection
Node Registration Issues¶
Node failed to join cluster
Most Common Issue - Wrong API Server Endpoint:
# Symptoms: kubelet timeouts, nodes never register
# Error: "dial tcp 10.243.65.4:6443: i/o timeout"
# 1. Check what endpoint kubelet is trying to reach
ssh ubuntu@INSTANCE_IP "cat /var/lib/kubelet/bootstrap-kubeconfig | grep server"
# 2. Find correct internal API endpoint
kubectl get endpointslice -n default -l kubernetes.io/service-name=kubernetes
# 3. Update NodeClass with correct INTERNAL endpoint
kubectl patch ibmnodeclass YOUR-NODECLASS --type='merge' \
-p='{"spec":{"apiServerEndpoint":"https://INTERNAL-IP:6443"}}'
Other Common Causes:
- VNI (Virtual Network Interface) not configured properly (v0.3.53+ required)
- Bootstrap token expiration
- Network connectivity problems
Debug steps:
# Check bootstrap logs on instance
ssh ubuntu@INSTANCE_IP "sudo journalctl -u cloud-final"
# Check kubelet status and errors
ssh ubuntu@INSTANCE_IP "sudo systemctl status kubelet"
ssh ubuntu@INSTANCE_IP "sudo journalctl -u kubelet --no-pager -n 50"
# Test API server connectivity from node
ssh ubuntu@INSTANCE_IP "curl -k -m 10 https://API-SERVER-IP:6443/healthz"
Security Group Configuration¶
Kubernetes API Server Access
Common Issue: Security groups blocking API server communication (TCP 6443)
Symptoms:
Required Security Group Rules:
Worker Node Security Group:
# Allow outbound to API server
ibmcloud is security-group-rule-add WORKER_SG_ID \
outbound tcp --port-min 6443 --port-max 6443 \
--remote CONTROL_PLANE_SUBNET_CIDR
# Allow inbound for return traffic
ibmcloud is security-group-rule-add WORKER_SG_ID \
inbound tcp --port-min 6443 --port-max 6443 \
--remote CONTROL_PLANE_SUBNET_CIDR
Control Plane Security Group:
# Allow inbound from workers
ibmcloud is security-group-rule-add CONTROL_PLANE_SG_ID \
inbound tcp --port-min 6443 --port-max 6443 \
--remote WORKER_SUBNET_CIDR
Debug connectivity:
Debug Mode¶
Enable debug logging for detailed information:
apiVersion: v1
kind: Deployment
metadata:
name: karpenter
namespace: karpenter
spec:
template:
spec:
containers:
- name: controller
env:
- name: KARPENTER_LOG_LEVEL
value: debug