ArgoCD Sync Failed Debugging Guide

What this usually means

ArgoCD sync failures usually stem from a mismatch between the desired state in Git and the live state in the Kubernetes cluster. However, the surface cause can be anything from a misconfigured repo server, a broken Helm chart, a missing CRD, a cluster API rate limit, or even a stale cache. The key is to isolate whether the failure is during manifest generation (e.g., Helm template error) or during the apply phase (e.g., resource conflict or validation error). Non-obvious causes include ArgoCD's own RBAC, webhook interference, and race conditions with other controllers.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

1Run `argocd app get <app-name> --hard-refresh` to force a fresh comparison and see the detailed error.
2Check the sync status events: `kubectl get events -n argocd --field-selector involvedObject.kind=Application,involvedObject.name=<app-name>`
3Inspect the ArgoCD application controller logs: `kubectl logs -n argocd deployment/argocd-application-controller --tail=100 | grep <app-name>`
4Verify the repo server can access the Git repo: `kubectl exec -n argocd deployment/argocd-repo-server -- argocd-util repo list`
5Check if the target cluster is reachable: `kubectl get secret -n argocd <cluster-secret> -o jsonpath='{.data.server}' | base64 -d` and curl the server URL.
6If using Helm, render the template manually: `argocd app get <app-name> --hard-refresh && argocd app manifests <app-name>` and look for errors.

( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

searchArgoCD application controller logs: `kubectl logs -n argocd deployment/argocd-application-controller`
searchRepo server logs: `kubectl logs -n argocd deployment/argocd-repo-server`
searchApplication CR status: `kubectl describe application <app-name> -n argocd`
searchGit repository access test: `kubectl exec -n argocd deployment/argocd-repo-server -- git ls-remote <repo-url>`
searchCluster API server logs (if using managed Kubernetes): cloud provider logs or kube-apiserver audit logs
searchArgoCD configmap `argocd-cm` for repository and resource customization settings
searchResource manifest generation output: `argocd app manifests <app-name>`

( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

warningGit repository inaccessible: expired SSH keys, wrong credentials, or network restrictions blocking the repo server.
warningHelm chart issues: missing dependencies, invalid values, or chart version not found in repository.
warningResource conflicts: a resource already exists and is not managed by ArgoCD (e.g., created manually or by another controller).
warningStale cache: ArgoCD caches manifests and cluster state; a hard refresh may resolve transient failures.
warningCluster API rate limiting: excessive sync requests causing the API server to throttle or drop requests.
warningRBAC misconfiguration: ArgoCD's service account lacks permissions to create/update resources in the target namespace.
warningCRD missing: custom resources defined in manifests require the corresponding CRD installed in the cluster.

( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

buildForce a hard refresh and retry sync: `argocd app sync <app-name> --prune --hard-refresh`.
buildUpdate Git repository credentials: edit the repository secret in `argocd` namespace and trigger a sync.
buildFix Helm dependencies: run `helm dependency update` locally and commit the updated `Chart.lock`.
buildDelete conflicting resources (`kubectl delete <resource> <name>`) or adopt them into ArgoCD by adding the `app.kubernetes.io/instance` label.
buildIncrease sync concurrency or tune `timeout.reconciliation` in `argocd-cm` to avoid rate limiting.
buildGrant necessary RBAC to ArgoCD: add a ClusterRole with required verbs and bind it to ArgoCD's service account.
buildInstall missing CRDs manually or via a separate ArgoCD Application set to sync first.

( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

verifiedRun `argocd app sync <app-name> --prune` and confirm the sync status becomes 'Synced' and health is 'Healthy'.
verifiedCheck the application events for any new errors: `kubectl get events -n argocd --field-selector involvedObject.kind=Application,involvedObject.name=<app-name>`.
verifiedUse `kubectl get application <app-name> -n argocd -o yaml` and inspect `status.conditions` for warnings.
verifiedPerform a diff: `argocd app diff <app-name> --hard-refresh` to ensure no unexpected deviations.
verifiedMonitor the application controller logs for 'sync succeeded' message for the application.
verifiedTest a second sync immediately after the first to confirm stability.

( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

warningDon't blindly delete resources without checking if they are managed by ArgoCD or other controllers.
warningAvoid using `--force` sync unless you understand the consequences (it can delete resources unexpectedly).
warningDon't ignore stale cache; always try a hard refresh before deep-diving into logs.
warningDon't assume a Helm chart error is a chart bug; verify that the repo server has network access to the Helm repository.
warningAvoid making simultaneous changes to Git and the cluster state while debugging; it confuses the comparison.
warningDon't set `syncPolicy.automated.prune: true` without careful testing; it can lead to data loss.

( 07 )War story

The Phantom OutOfSync: A Tale of Stale Cache and Missing CRDs

Senior Platform EngineerArgoCD 2.4, Helm 3.8, AWS EKS, GitHub Enterprise

Timeline

09:15Alert: 'Application payment-service sync failed' in #argocd-alerts Slack channel.
09:18Check ArgoCD UI: payment-service shows 'OutOfSync' with error 'Failed to sync: one or more objects failed to apply'.
09:22Run `argocd app get payment-service --hard-refresh`; error persists. Check repo server logs: no Git errors.
09:30Run `argocd app manifests payment-service` and see Helm template output includes a custom resource `RedisCluster`.
09:35Check if `RedisCluster` CRD exists: `kubectl get crd redisclusters.redis.redis.com` — not found.
09:40Install missing CRD from operator Helm chart and wait for CRD to be established.
09:45Run `argocd app sync payment-service --hard-refresh` — sync succeeds.
09:50Post-mortem: root cause was an upstream dependency update that added a new CRD without documenting the prerequisite.

When the alert hit our Slack, I assumed it was a network issue — GitHub Enterprise had been flaky the week before. I did the usual: hard refresh, check repo server logs, all clean. The error message was generic: 'one or more objects failed to apply'. I wasted 10 minutes thinking it was a conflict with a manual change.

Then I ran `argocd app manifests payment-service` and spotted `RedisCluster` in the output. The manifest looked fine, but a quick `kubectl get crd` confirmed the CRD didn't exist. Our Helm chart had recently added a Redis operator dependency, and the CRD was supposed to be installed via a separate chart — but that chart wasn't synced yet.

I installed the CRD manually and triggered a hard refresh sync. It worked. The lesson: always inspect the full manifest output when sync fails, especially after dependency updates. Stale cache wasn't the culprit here — it was a missing prerequisite. We now have a policy to sync CRD applications before dependent applications.

Root cause

Missing CustomResourceDefinition (CRD) for `RedisCluster` resource introduced by a new Helm dependency.

The fix

Installed the missing CRD by deploying the Redis operator Helm chart, then performed a hard refresh sync.

The lesson

When sync fails with a generic apply error, inspect the generated manifests for custom resources and verify all CRDs are installed. Automate CRD installation order using Application dependencies in ArgoCD.

( 08 )Understanding the Sync Pipeline: Where Failures Occur

ArgoCD's sync process has three stages: manifest generation, comparison, and apply. Failures can occur at any stage. Manifest generation happens in the repo server: it runs `helm template`, `kustomize build`, or plain YAML reading. If the repo server can't fetch the repo, the chart, or the dependencies, generation fails. The error is usually logged in the repo server pod.

Comparison happens in the application controller: it compares the generated manifests against the live cluster state using cached data. If the cache is stale or the cluster API returns errors, you'll see 'comparison failed' or 'cache error'. The apply stage uses `kubectl apply` — errors here include validation failures, conflicts, or RBAC denials. Knowing which stage fails cuts debugging time by half.

( 09 )Stale Cache: The Silent Saboteur

ArgoCD caches both the generated manifests and the cluster state to reduce load. The cache lives in the application controller's memory and is persisted to etcd. When you see a sync fail, the first thing to do is force a hard refresh (`argocd app get <app> --hard-refresh`). This clears the manifest cache and re-fetches from Git. I've seen cases where a simple cache invalidation fixed a week-old 'OutOfSync' that everyone assumed was a code problem.

The cache TTL is configurable via `argocd-cm` key `timeout.reconciliation`, defaulting to 3 minutes. If you have frequent Git pushes, set it lower. But beware: too low a TTL increases load on the repo server and cluster API. A hard refresh bypasses the cache entirely — use it as a diagnostic tool, not a permanent solution.

( 10 )Helm Chart Troubles: Beyond Missing Dependencies

Helm is the most common cause of manifest generation failures. Beyond missing dependencies, I've seen issues with: Helm repository authentication (the repo server needs the repo credentials in the `argocd` namespace secret), invalid values overrides (especially YAML indentation errors), and chart versions that don't exist. Always test locally: `helm template mychart --values values.yaml` and compare with the ArgoCD output.

Another non-obvious gotcha: ArgoCD uses a specific Helm version (configurable via `argocd-cm` key `helm.version`). If your chart requires a newer Helm feature, the repo server's Helm binary might be too old. Check the repo server image version or override the Helm version in the application spec. Also, remember that Helm repositories are cached — run `helm repo update` in the repo server pod if you suspect stale index.

( 11 )RBAC: The Invisible Gatekeeper

ArgoCD deploys resources using its own service account (`argocd-application-controller`). If you're using a managed cluster (EKS, GKE, AKS) with IAM integration, the service account needs proper role bindings. But even in-cluster, the default RBAC might not cover all resource types. For example, if your manifest includes a `VolumeSnapshot` or a custom resource, ArgoCD may lack permissions to create it.

To debug RBAC, check the application controller logs for 'forbidden' or 'unauthorized'. Then identify the missing verb: `kubectl auth can-i create <resource> --as=system:serviceaccount:argocd:argocd-application-controller -n <target-ns>`. If it returns 'no', create a ClusterRole and binding. Also note that ArgoCD's `argocd-server` (the UI) has its own RBAC for user access — that's different and rarely affects sync.

( 12 )Resource Conflicts and Pruning Pitfalls

When ArgoCD tries to apply a resource that already exists in the cluster but is not managed by ArgoCD, the sync fails with 'already exists' or 'conflict'. You have three options: adopt the resource by adding the `app.kubernetes.io/instance` label with the application name, delete the existing resource (if safe), or exclude it from ArgoCD's management using `ignoreDifferences` or `resourceExclusions`.

Pruning (deleting resources that are in the cluster but not in Git) is a common source of accidental data loss. Always review what will be pruned: `argocd app sync <app> --dry-run --prune`. If you have auto-sync with prune enabled, make sure you understand the consequences. I recommend using `autoPrune: false` initially and manually pruning after review.

Frequently asked questions

Why does my ArgoCD app show 'OutOfSync' even after a successful sync?

This usually means there is a controller or operator in the cluster that mutates resources after ArgoCD applies them. For example, a service mesh injector adds sidecar containers, or a mutating webhook modifies labels. To fix, use `ignoreDifferences` in the application spec to ignore fields that are mutated, or use `syncOptions: RespectIgnoreDifferences=true`. Also check that the live state matches the desired state — sometimes the comparison is confused by order of fields.

How do I debug a sync that hangs indefinitely?

A hanging sync often indicates a network issue or a resource that never becomes healthy. First, check the application controller logs for the application: `kubectl logs -n argocd deployment/argocd-application-controller | grep <app-name>`. Look for 'sync operation timed out' or 'context deadline exceeded'. Then, check if the target cluster is healthy: `argocd cluster list` and ensure the cluster is reachable. If using a private repo, verify the repo server can reach it. Also, increase the sync timeout in the application spec: `spec.syncPolicy.syncOptions: - Validate=false` can help, but use with caution.

What does 'Failed to sync: one or more objects failed to apply' mean exactly?

This generic message means the apply phase failed for at least one resource. The detailed error is usually in the application events: `kubectl describe application <app-name> -n argocd` and look under `status.conditions` or `status.operationState.syncResult`. Common specifics include 'already exists', 'forbidden', 'invalid', or a validation error. Always inspect the events and the manifest output to find the exact resource and error.

How can I prevent sync failures due to missing CRDs?

Use ArgoCD's sync waves and application dependencies. Define a separate Application for the CRD operator and set `syncPolicy.automated.prune: false`. Then, in the dependent application, add `spec.dependencies` referencing the CRD application. This ensures the CRD is installed before the resources that use it. Alternatively, install CRDs via a Helm chart that uses `crd-install` hook, but be aware that Helm 3+ handles CRDs differently.

Why does my sync fail only on certain nodes or namespaces?

This suggests a node-level or namespace-level restriction. Check if the target namespace has a ResourceQuota or LimitRange that prevents resource creation. Also check for PodSecurityPolicies or OPA/Gatekeeper constraints that block certain resource types. Use `kubectl describe quota -n <namespace>` and `kubectl get constraints` to check. Additionally, verify that ArgoCD's service account has role bindings in that specific namespace.

ArgoCD Application Sync Failed: A Hands-On Debugging Guide

What this usually means

Frequently asked questions