This guide covers common issues and their solutions.
Symptoms: "No matching escalation found" or "Not authorized"
Solutions:
- Verify escalation exists
kubectl get breakglassescalation -A- Verify user group membership - Check token claims
kubectl auth can-i get pods --as=user@example.com- Verify cluster name - Ensure session uses correct cluster name
kubectl get clusterconfig- Verify escalation scope - Check
allowed.clusterscontains your cluster
kubectl get breakglassescalation <name> -o yaml | grep -A 5 allowedSymptoms: Request not being approved or rejected after hours
Solutions:
- Verify approvers exist
kubectl get breakglassescalation <name> -o yaml | grep -A 5 approvers- Check OIDC group prefix stripping - See if groups are being mapped correctly
# Review config.yaml oidcPrefixes setting
cat config.yaml | grep -A 5 "kubernetes:"- Verify approver can access API
curl -H "Authorization: Bearer $APPROVER_TOKEN" \
https://breakglass.example.com/api/breakglass/status- Check notification service - Verify email is configured for dev/test
# For dev: curl http://breakglass-dev:30084
# For prod: check configured email serviceSymptoms: "Session expired" error within expected duration
Solutions:
- Verify maxValidFor setting
kubectl get breakglasssession <name> -o yaml | grep -E "maxValidFor|expiresAt"- Check system clock synchronization
# On both hub and tenant clusters
dateSymptoms:
Error: failed to call webhook: connection refused
Solutions:
- Test network connectivity from tenant cluster
ping breakglass.example.com
telnet breakglass.example.com 443- Check DNS resolution
nslookup breakglass.example.com-
Verify firewall rules - Ensure egress from tenant to hub allowed
-
Test webhook endpoint
curl -k https://breakglass.example.com/api/breakglass/webhook/authorize/my-clusterSymptoms:
Error: Unauthorized (401)
Solutions:
- Verify webhook token in kubeconfig is valid
echo $TOKEN | cut -d'.' -f2 | base64 -d | jq .- Check token expiration
echo $TOKEN | cut -d'.' -f2 | base64 -d | jq '.exp' | xargs -I{} date -d @{}- Validate kubeconfig format
kubectl --kubeconfig=/etc/kubernetes/breakglass-webhook-kubeconfig.yaml cluster-infoSymptoms:
Error: context deadline exceeded
Solutions:
- Increase timeout in authorization config
webhook:
timeout: 5s
unauthorizedTTL: 30s- Check network latency
ping -c 5 breakglass.example.com- Check for recursive webhook calls (multi-cluster OIDC setups only)
If you're using OIDC authentication for spoke clusters, the breakglass manager's OIDC identity may be triggering recursive webhook calls. See Preventing Recursive Webhook Calls for the full explanation.
Quick fix: Add the OIDC identity to the webhook's matchConditions exclusion:
matchConditions:
# ... existing conditions ...
- expression: "request.user != 'breakglass-group-sync@service.local'"And grant RBAC permissions to the OIDC identity on spoke clusters. See RBAC Requirements for OIDC Authentication.
Symptoms: Requests denied despite active session
Solutions:
- Verify session is approved
kubectl get breakglasssession <name> -o yaml | grep -E "conditions|approved"- Check DenyPolicy restrictions
kubectl get denypolicy
kubectl get denypolicy <name> -o yaml-
Verify cluster name matches webhook URL path
-
Test webhook directly
curl -X POST https://breakglass.example.com/api/breakglass/webhook/authorize/prod-cluster-1 \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"apiVersion": "authorization.k8s.io/v1",
"kind": "SubjectAccessReview",
"spec": {
"user": "test@example.com",
"resourceAttributes": {
"verb": "get",
"resource": "pods"
}
}
}'Symptoms: ClusterConfig phase shows "Failed"
Solutions:
- Verify secret exists
kubectl get secret <secret-name> -n <namespace>- Verify kubeconfig is valid (replace
valuewith the key specified inkubeconfigSecretRef.key)
kubectl get secret <secret-name> -n <namespace> \
-o jsonpath='{.data.value}' | base64 -d > /tmp/test.kubeconfig
kubectl --kubeconfig=/tmp/test.kubeconfig cluster-info- Test cluster access permissions
kubectl --kubeconfig=/tmp/test.kubeconfig auth can-i '*' '*'Symptoms: Internal server error on API calls
Solutions:
- Review controller logs
kubectl logs -n breakglass-system deployment/breakglass-controller -f- Verify OIDC connectivity
curl https://keycloak.example.com/realms/master/.well-known/openid-configurationSymptoms: "Unauthorized" or "Invalid token"
Solutions:
- Verify token from correct OIDC provider
echo $TOKEN | cut -d'.' -f2 | base64 -d | jq .-
Check token issuer matches configured URL
-
Verify token groups - Check OIDC prefix stripping
echo $TOKEN | cut -d'.' -f2 | base64 -d | jq '.groups'Symptoms: API returns empty escalations
Solutions:
- Verify user groups match escalation allowed groups
echo $TOKEN | cut -d'.' -f2 | base64 -d | jq '.groups'
kubectl get breakglassescalation -o yaml | grep -A 5 "allowed:"- Check group prefix stripping is working
Symptoms: "Failed to fetch OIDC configuration"
Solutions:
- Verify OIDC authority URL is accessible
curl https://keycloak.example.com/realms/master/.well-known/openid-configuration- Check TLS certificates
openssl s_client -connect keycloak.example.com:443- Test from breakglass pod
kubectl exec -it -n breakglass-system deployment/breakglass-controller -- \
curl https://keycloak.example.com/realms/master/.well-known/openid-configurationSymptoms: Users have no groups in token
Solutions:
-
Verify OIDC client has group mapper configured
-
Check Keycloak group mapper settings
-
Decode token to verify groups are included
echo $TOKEN | cut -d'.' -f2 | base64 -d | jq '.'This section covers issues specific to ClusterConfig resources using authType: OIDC for authenticating to managed clusters.
Symptoms: ClusterConfig status shows Ready=False with reason OIDCDiscoveryFailed
Solutions:
- Verify the issuer URL is correct and accessible
# Check the ClusterConfig's OIDC issuer URL
kubectl get clusterconfig <name> -o yaml | grep issuerURL
# Test OIDC discovery endpoint
curl -s https://<issuer>/.well-known/openid-configuration | jq .- If using a private CA, ensure
certificateAuthorityis set in the OIDC config
spec:
oidcAuth:
issuerURL: https://keycloak.internal.example.com/realms/kubernetes
certificateAuthority: |
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE------ Check controller logs for detailed error
kubectl logs -n breakglass-system deployment/breakglass-controller | grep -i oidcSymptoms: ClusterConfig status shows Ready=False with reason OIDCTokenFetchFailed
Solutions:
- Verify client credentials are correct
# Check the client secret exists
kubectl get secret <client-secret-name> -n <namespace>
# Test client credentials flow manually
curl -X POST https://<issuer>/protocol/openid-connect/token \
-d "grant_type=client_credentials" \
-d "client_id=<client-id>" \
-d "client_secret=<client-secret>"-
Verify the OIDC client has:
- Service accounts enabled
- Client credentials grant enabled
- Correct permissions/roles
-
Check if the client is confidential (has a secret)
Symptoms: ClusterConfig initially works but later shows OIDCRefreshFailed
Solutions:
- Verify refresh tokens are enabled in the OIDC provider
- Check token lifetimes are reasonable (not too short)
- The controller will automatically fall back to re-authentication
Symptoms: ClusterConfig status shows Ready=False mentioning OIDC client secret missing
Solutions:
- Verify the secret exists in the correct namespace
kubectl get secret <name> -n <namespace>- Verify the secret has the correct key
kubectl get secret <name> -n <namespace> -o jsonpath='{.data}' | jq- Common key names are
client-secret(default) or check yourclientSecretRef.keysetting
Symptoms: ClusterConfig referencing an IdentityProvider fails with "not found" error
Solutions:
- Verify the IdentityProvider exists
kubectl get identityprovider <name>- Verify the name matches exactly (case-sensitive)
# Check the reference in ClusterConfig
kubectl get clusterconfig <name> -o yaml | grep -A 3 oidcFromIdentityProviderSymptoms: ClusterConfig fails because referenced IdentityProvider is disabled
Solutions:
- Enable the IdentityProvider
kubectl patch identityprovider <name> --type=merge -p '{"spec":{"disabled":false}}'- Or use a different IdentityProvider that is enabled
Symptoms: "x509: certificate signed by unknown authority" or similar TLS errors
Solutions:
- Provide the cluster CA certificate
spec:
oidcAuth:
server: https://api.my-cluster.example.com:6443
caSecretRef:
name: cluster-ca-secret
namespace: breakglass-system
key: ca.crt-
Or enable TOFU (Trust On First Use) - the controller will automatically trust the first CA it sees
-
As a last resort (NOT for production), use
insecureSkipTLSVerify: true
Steps to debug:
- Check ClusterConfig status and conditions
kubectl get clusterconfig <name> -o yaml | grep -A 20 status- Look at controller logs
kubectl logs -n breakglass-system deployment/breakglass-controller --since=5m | grep -i "oidc\|token"-
Test token acquisition manually using the configured credentials
-
Verify the target cluster accepts OIDC tokens
# Use kubectl with OIDC plugin to test
kubectl --server=https://api.cluster.example.com:6443 \
--token=<oidc-token> \
auth can-i get podsSymptoms: Pod in CrashLoopBackOff
Solutions:
- Review pod logs
kubectl logs -n breakglass-system deployment/breakglass-controller- Check resource availability
kubectl top nodes
kubectl describe nodes- Verify ConfigMap and Secret exist
kubectl get configmap,secret -n breakglass-systemSymptoms: Breakglass consuming excessive resources
Solutions:
- Review metrics
kubectl top pod -n breakglass-system- Check for stuck reconciliation loops
kubectl logs -n breakglass-system deployment/breakglass-controller | grep "error"Symptoms: Approval endpoint takes > 5 seconds
Solutions:
-
Monitor cluster connectivity latency
-
Check webhook authorization latency
kubectl logs -n breakglass-system deployment/breakglass-controller | \
grep "authorization duration"- Optimize ClusterConfig QPS/Burst settings
spec:
qps: 200
burst: 400Symptoms: "Token issued by unknown issuer, please authenticate using one of the configured identity providers"
Causes:
- IdentityProvider's
issuerfield doesn't match token'sissclaim - Token from unconfigured provider
- Issuer URL has trailing slash mismatch
Solutions:
- Extract the actual issuer from the token
TOKEN=$(your-token-here)
echo $TOKEN | cut -d'.' -f2 | base64 -d | jq .iss- Compare with configured IdentityProviders
kubectl get identityprovider -o yaml | grep -E "name:|issuer:"- Update the issuer if it doesn't match
kubectl patch identityprovider <name> -p '{"spec":{"issuer":"https://correct-issuer.example.com"}}'Symptoms: Only showing direct login, not IDP selector screen
Causes:
- Less than 2 IdentityProviders configured
- Only 1 provider has
disabled: false
Solutions:
- Check how many providers are enabled
kubectl get identityprovider -o yaml | grep -E "name:|disabled:"- Create or enable a second provider if needed
kubectl patch identityprovider <name> -p '{"spec":{"disabled":false}}'Symptoms: "Access denied" even though user has the escalation
Causes:
- Escalation restricted to different IDPs
- User's token from different IDP than expected
Solutions:
- Check escalation's allowed IDPs
kubectl get breakglassescalation <name> -o yaml | grep -A 5 allowedIdentityProviders- Verify user's token IDP
TOKEN=$(your-token-here)
echo $TOKEN | cut -d'.' -f2 | base64 -d | jq .iss- Update escalation to allow user's IDP
kubectl patch breakglassescalation <name> -p '{"spec":{"allowedIdentityProvidersForRequests":["corp-oidc","keycloak-idp"]}}'Symptoms: GroupSyncStatus shows "PartialFailure" or "Failed"
Causes:
- IDP connection timeout
- Invalid Keycloak credentials
- Network issues
Solutions:
- Check sync status and errors
kubectl get breakglassescalation <name> -o yaml | grep -A 10 "groupSync"- Check IdentityProvider events
kubectl describe identityprovider <name> | grep -A 5 "Events:"- Verify IDP is reachable
kubectl run -it debug --image=curlimages/curl --restart=Never -- \
curl https://keycloak.example.com/health- Check credentials and permissions
kubectl get secret <secret-name> -o yaml
# Verify it has the right clientID and clientSecretView all resources:
kubectl get breakglassescalation,breakglasssession,clusterconfig,denypolicy -ACheck recent events:
kubectl describe deployment -n breakglass-system breakglass-controllerStream logs with filtering:
kubectl logs -n breakglass-system deployment/breakglass-controller -f | grep -i errorTest OIDC token:
TOKEN=$(kubectl create token breakglass-webhook-sa -n breakglass-system)
echo $TOKEN | cut -d'.' -f2 | base64 -d | jq .Check API health:
curl -k https://breakglass.example.com/api/breakglass/healthIf issues persist:
- Collect debug information
kubectl get all -n breakglass-system -o yaml > debug.yaml
kubectl logs -n breakglass-system deployment/breakglass-controller > logs.txt-
Search GitHub issues for similar problems
-
Review controller logs carefully - most issues are evident there
-
Test connectivity with curl directly
-
Verify all configuration files match requirements
- Webhook Setup - Webhook configuration
- Cluster Config - Cluster connection details
- API Reference - API endpoints