What this usually means
The Helm chart template rendered correctly (syntax), but the resulting Kubernetes objects conflict with existing resources, fail schema validation, or are blocked by admission webhooks. In Helm 3, the release metadata is stored as Secrets in the namespace; if those Secrets are missing or corrupted, Helm can't track state. Another common class: Helm hooks (pre/post-install, pre/post-upgrade) that fail cause the entire release to be marked as failed even if the main workload deployed successfully.
The first ten minutes — establish facts before touching code.
- 1Run `helm history <release> -n <ns> --max 10` to see the exact revision that failed and any previous state.
- 2Run `helm get manifest <release> -n <ns> --revision <N> > manifest.yaml` to inspect the manifests that Helm attempted to apply.
- 3Run `kubectl get events -n <ns> --sort-by='.lastTimestamp' | tail -20` to catch admission webhook rejections or resource conflicts.
- 4Run `helm template <release> <chart> -n <ns> --dry-run 2>&1 | diff - <(helm get manifest <release> -n <ns>)` to compare the intended vs actual manifests.
- 5Check the Helm secret: `kubectl get secret -n <ns> -l 'owner=helm,name=<release>' -o yaml` to ensure the release Secret exists and is not malformed.
- 6If hooks are involved, run `kubectl get job -n <ns> -l 'app.kubernetes.io/managed-by=Helm'` and check logs of the hook pods.
The specific files, logs, configs, and dashboards that usually own this bug.
- search`helm history <release> -n <ns>` — release state and revision history
- search`kubectl get secrets -n <ns> -l 'owner=helm,name=<release>'` — Helm release storage secrets
- search`kubectl get events -n <ns> --sort-by='.lastTimestamp'` — Kubernetes events including admission webhook denials
- search`helm get notes <release> -n <ns>` — chart notes that may hint at post-deployment steps
- searchKubernetes API server audit logs (if available) — for RBAC failures on resource creation
- searchChart's `templates/` directory and `values.yaml` — especially for any hardcoded namespaces or forbidden annotations
Practical causes, not theory. These are the things you will actually find.
- warningResource already exists: a Kubernetes object (e.g., Namespace, ClusterRole) with the same name already exists but is not managed by Helm.
- warningAdmission webhook rejection: a ValidatingWebhookConfiguration or MutatingWebhookConfiguration blocks the resource (e.g., Istio, OPA Gatekeeper).
- warningHook failure: a pre-install or pre-upgrade hook job times out or fails, causing Helm to abort the release.
- warningMissing dependencies: chart dependencies (`dependencies` in Chart.yaml) are not updated or not present in the local cache.
- warningRBAC insufficiency: the service account used by Helm (or Tiller) lacks permissions to create/update resources (e.g., CRDs, cluster-scoped resources).
- warningCorrupted Helm secret: the release Secret was manually deleted or tampered with, leaving Helm in a broken state.
- warningResource quota exceeded: the namespace has resource quotas that prevent the deployment from being created.
Concrete fix directions. Pick the one that matches your root cause.
- buildFor resource conflicts: adopt the existing resource with `helm upgrade --install --adopt-existing-resource` (Helm 3.12+) or import via a script that adds the Helm label and annotation.
- buildFor admission webhooks: review the webhook configuration and either adjust the webhook rules to exclude Helm-managed resources or add required annotations to the chart templates.
- buildFor hook failures: inspect hook job logs with `kubectl logs job/<hook-job-name> -n <ns>` and fix the underlying issue (e.g., missing configmap, wrong command).
- buildFor missing dependencies: run `helm dependency update <chart-dir>` and re-run the release.
- buildFor corrupted Secrets: delete the problematic Helm secret (`kubectl delete secret -n <ns> sh.helm.release.v1.<release>.v<N>`) and rollback to a previous revision with `helm rollback <release> <revision>`.
- buildFor RBAC issues: grant the necessary ClusterRole or Role to the service account used by Helm, or use a higher-privileged account (temporarily).
A fix you cannot prove is a guess. Close the loop.
- verifiedRun `helm upgrade --dry-run --debug <release> <chart> -n <ns>` and confirm it outputs a valid manifest without errors.
- verifiedRun `helm test <release> -n <ns>` if the chart includes test hooks to verify the deployment is functional.
- verifiedCheck `helm status <release> -n <ns>` shows `DEPLOYED` and the revision number incremented.
- verifiedManually verify key resources: `kubectl get deployment,service,configmap -n <ns> -l 'app.kubernetes.io/managed-by=Helm'` and ensure they match the chart.
- verifiedRun a curl or port-forward to the service to confirm the application responds as expected.
Things that make this bug worse or harder to find.
- warningDeleting the Helm release Secret directly without understanding that it destroys the release state and can lead to orphaned resources.
- warningRunning `helm rollback` on a failed release without first fixing the underlying cause — the rollback will likely fail with the same error.
- warningIgnoring admission webhook logs — they often contain the exact rejection reason but require `kubectl logs` on the webhook pod.
- warningUsing `--force` flag blindly — it deletes and recreates resources, which can cause data loss or downtime.
- warningNot running `helm dependency update` after modifying Chart.yaml dependencies.
- warningAssuming `helm template` output is exactly what gets applied — it doesn't account for mutations by admission webhooks.
The Ghost Release That Wouldn't Upgrade
Timeline
- 09:15ArgoCD sync triggers helm upgrade for 'payment-service' chart v3.2.1
- 09:16Helm returns UPGRADE FAILED: 'failed to create resource: admission webhook "validate.istio.io" denied the request'
- 09:17I run `helm history payment-service -n payments`; shows revision 42 (failed) and 41 (deployed)
- 09:19I run `helm get manifest payment-service -n payments --revision 41 > old.yaml` and compare with `helm template .`
- 09:22Diff shows new VirtualService has a host that conflicts with an existing Istio rule
- 09:25I check Istio webhook logs: `kubectl logs -n istio-system deployment/istiod | grep denied`
- 09:28Webhook log: 'VirtualService payment-service: host conflict with existing VirtualService legacy-payment'
- 09:30I update the chart values to use a different host, re-run helm upgrade, success at revision 43
The pager went off at 9:15 AM because ArgoCD reported a sync failure for the payment-service. I opened the ArgoCD UI and saw the Helm upgrade had failed with an admission webhook error from Istio. This was a new sprint release that added a canary route via a VirtualService.
I've been burned before by assuming the error message tells the whole story. So I grabbed the existing release manifest with `helm get manifest` and did a diff against the dry-run output. Sure enough, the new VirtualService had a host that overlapped with an old VirtualService that was no longer managed by Helm but still existed in the cluster.
The Istio webhook was doing its job: rejecting the duplicate host. I updated the chart values to use a unique host prefix, committed the change, and the ArgoCD sync succeeded. The lesson: admission webhooks are your friends, but you need to check their logs — and never assume `helm template` output is what actually lands in the cluster.
Root cause
Admission webhook (Istio) rejected a VirtualService due to host conflict with an existing resource not managed by Helm.
The fix
Changed the host in the chart values to a unique value, verified with `helm template --dry-run`, then ran `helm upgrade`.
The lesson
Always diff current manifests against dry-run output when an admission webhook is in play. The webhook logs are the ultimate source of truth.
In Helm 3, release metadata is stored in Secrets in the same namespace as the release. Each revision gets a Secret named `sh.helm.release.v1.<release>.v<revision>`. The Secret contains the full release object (manifest, values, hooks, etc.) encoded in base64 under the `release` key.
If any of these Secrets are missing or corrupted, Helm will fail with errors like `release "<name>" not found` or `failed to decode release`. You can inspect the Secret with `kubectl get secret -n <ns> sh.helm.release.v1.<release>.v<N> -o jsonpath='{.data.release}' | base64 -d | gunzip` to see the raw JSON. This is especially useful when `helm get manifest` returns nothing.
Helm hooks (pre/post-install, pre/post-upgrade, etc.) run as Kubernetes Jobs. If the hook job fails, Helm marks the entire release as failed. The error message might not mention the hook at all — it might say `UPGRADE FAILED` without details.
To debug, first list hook jobs: `kubectl get jobs -n <ns> -l 'app.kubernetes.io/managed-by=Helm'`. Then check the logs of the pod that ran: `kubectl logs job/<job-name> -n <ns>`. Common hook failures include missing ConfigMaps, incorrect image tags, or network timeouts. You can also inspect the job's conditions: `kubectl get job <job-name> -n <ns> -o jsonpath='{.status.conditions}'`.
Helm 3 does not adopt existing resources. If a resource with the same name already exists (e.g., created manually or by another tool), Helm will fail to create it. The error might be a generic `AlreadyExists` or a webhook denial.
The fix is either to delete the existing resource (if safe) or to adopt it by adding the Helm labels and annotations: `app.kubernetes.io/managed-by: Helm` and `meta.helm.sh/release-name: <release>` and `meta.helm.sh/release-namespace: <ns>`. Starting Helm 3.12, you can use `--adopt-existing-resource` flag on upgrade. For older versions, write a script to patch the resource.
When an admission webhook blocks a Helm resource, the error message often includes the webhook name (e.g., `denied by webhook "validate.istio.io"`). But the full reason is only in the webhook logs.
For Istio: `kubectl logs -n istio-system deployment/istiod | grep -i denied`. For OPA Gatekeeper: `kubectl logs -n gatekeeper-system deployment/gatekeeper-audit`. For custom webhooks, find the pod with `kubectl get pods -n <ns> -l 'app=<webhook-name>'`. The logs will contain the exact constraint that was violated.
If you can't access logs, you can temporarily disable the webhook by patching its ValidatingWebhookConfiguration: `kubectl patch validatingwebhookconfiguration <name> -p '{"webhooks":[{"name":"<webhook-name>","clientConfig":{"url":null}}]}'`. But do this only as a last resort and revert immediately.
Sometimes `helm upgrade` returns success but the release stays in `pending-upgrade` state. This happens when the Kubernetes API accepts the resources but the Helm release Secret isn't updated (e.g., due to a crash or timeout). The release is effectively stuck.
To recover, first check the current manifests: `helm get manifest <release> -n <ns>`. If they look correct, manually update the status: `kubectl get secret -n <ns> -l 'name=<release>' -o json | jq '.metadata.labels.status="deployed"' | kubectl apply -f -`. Then force a new revision: `helm upgrade --force <release> <chart> -n <ns>`. This rewrites the Secret with the correct state.
Frequently asked questions
What does 'UPGRADE FAILED: failed to replace resource' mean?
This error means the Kubernetes API rejected the update because the resource's immutable fields changed (e.g., `selector` on a Deployment, or `port` on a Service). Helm tries to patch the resource, but the API returns a conflict. Solution: delete the resource manually and let Helm recreate it, or change the chart to not modify immutable fields.
How do I fix 'pending-install' or 'pending-upgrade' state?
These states indicate the previous operation didn't complete. First, check if the resources were actually created (kubectl get all). If they exist, force a rollback: `helm rollback <release> <previous-revision>`. If that fails, delete the Helm secret for the pending revision: `kubectl delete secret sh.helm.release.v1.<release>.v<failed-revision> -n <ns>`. Then run helm upgrade again.
Why does `helm template` succeed but `helm install` fail?
`helm template` only renders the templates and validates YAML syntax. It does not apply the resources, so it won't catch admission webhook rejections, namespace quota issues, or resource conflicts. Always run `helm install --dry-run --debug` which actually sends the request to the API server (with dry-run flag) and gets real validation errors.
How do I debug RBAC errors during Helm release?
RBAC errors typically show as 'forbidden: User "system:serviceaccount:..." cannot create resource ...'. Check the service account used by Helm (often `default` in the namespace). Grant the necessary permissions via a Role or ClusterRole. You can also use `kubectl auth can-i create deployment -n <ns> --as system:serviceaccount:<ns>:<sa>` to test. Look at the Helm pod's service account if using Tiller, or the CI/CD service account.
What is the best way to compare current vs intended state?
Use `helm template <chart> -n <ns> --dry-run > intended.yaml` to get the rendered templates. Then `helm get manifest <release> -n <ns> > current.yaml`. Diff them with `diff -u current.yaml intended.yaml`. This catches any manual changes or drift. For advanced diffing, use tools like `helm-diff` plugin (`helm diff upgrade <release> <chart> -n <ns>`) which handles hooks and metadata.