GitHub Actions Workflow Failing Debug

What this usually means

GitHub Actions failures rarely come from a single cause. The most common pattern is a mismatch between local environment and the runner environment: missing dependencies, incorrect Node.js version, or environment variables not set. But the real trap is the error message itself—GitHub often masks the root cause with a generic 'Process completed with exit code 1'. You need to look past that to the last few lines of the step output. Another frequent issue is workflow syntax errors that only surface when GitHub parses the file, especially with nested expressions or multi-line strings. And if a job hangs in 'queued', it's almost always a runner capacity issue or a self-hosted runner that's gone offline.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

1Click on the failed workflow run, then expand the failed job and step to see the raw log output. Scroll to the last 20 lines.
2Check the 'Annotations' tab in the workflow run page—GitHub often adds a warning or error annotation with a direct link to the problematic line.
3Re-run the workflow with debug logging enabled: add ACTIONS_RUNNER_DEBUG=true and ACTIONS_STEP_DEBUG=true as repository secrets.
4Inspect the workflow YAML for syntax errors by using the 'Validate' button in the GitHub Actions UI or by running `action-validator` locally.
5Verify that all required secrets and variables are set in the repository settings—missing secrets silently fail without clear errors.
6Check GitHub status page (status.github.com) for any ongoing incidents affecting Actions.

( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

searchWorkflow run log page: https://github.com/<owner>/<repo>/actions/runs/<run_id>
searchRepository Settings > Secrets and variables > Actions (for missing secrets)
searchGitHub Actions Runner logs (self-hosted): /var/log/runner/ (or where runner is installed)
searchLocal YAML validation: use `action-validator` or `yamllint` on the .github/workflows/*.yml files
searchGitHub API: `GET /repos/{owner}/{repo}/actions/runs/{run_id}/jobs` to get raw job status
searchWorkflow file itself: .github/workflows/<workflow-name>.yml
searchGitHub status page: https://www.githubstatus.com/

( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

warningMissing or incorrect secrets: secret not set, or workflow not authorized to access it
warningWorkflow YAML syntax error: invisible characters, tabs vs spaces, or invalid expression syntax
warningRunner environment mismatch: different OS, missing package, or wrong Node.js version
warningCaching issues: stale cache causing build failures (e.g., node_modules with conflicting dependencies)
warningGitHub Actions service degradation: rate limiting, runner unavailability, or network issues
warningThird-party action version: action uses deprecated API or has breaking changes (e.g., actions/checkout@v2 vs v3)
warningPermissions: workflow lacks permissions to push, create releases, or access required resources

( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

buildIf missing secret: set in repo settings and ensure workflow has `secrets: inherit` or explicit secret mapping
buildIf YAML error: re-indent the file using 2-space indentation, avoid tabs, and use single quotes for strings with special characters
buildIf runner mismatch: pin exact runner image (ubuntu-22.04, windows-2019) and use `setup-node` with explicit version
buildIf cache stale: add a cache key that includes `hashFiles('**/package-lock.json')` and clear cache manually from GitHub UI
buildIf permission denied: add `contents: write` or `id-token: write` at job level, or use `permissions: read-all` for fine-grained control
buildIf action broken: pin action to a specific SHA (e.g., `actions/checkout@a12f394`) instead of a version tag

( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

verifiedAfter fix, re-run the workflow and ensure all steps pass green.
verifiedAdd a step that outputs the environment: `run: env` to verify secrets and variables are set correctly.
verifiedTest the workflow on a pull request from a fork to confirm permissions work for external contributors.
verifiedUse `act` (nektos/act) to run the workflow locally and compare behavior with GitHub runner.
verifiedCheck the workflow run log for any warning annotations that might indicate residual issues.
verifiedMonitor the workflow for a week to confirm no recurrence across multiple runs.

( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

warningDon't blindly re-run the workflow without reading the logs—re-running hides the real error.
warningDon't use `latest` tags for actions; they change unexpectedly and break your workflow.
warningDon't ignore warnings about deprecated actions or node versions—they become errors eventually.
warningDon't store secrets in workflow YAML files—use GitHub Secrets or OpenID Connect.
warningDon't assume local workflow tests with `act` perfectly match GitHub runners—there are differences in environment and permissions.
warningDon't forget to check the GitHub status page when workflows are stuck in queue—it's often an outage.

( 07 )War story

The mysterious 'Process completed with exit code 1' on PR builds

Senior DevOps EngineerNode.js 16, GitHub Actions, npm, self-hosted runner on Ubuntu 20.04

Timeline

09:00Team reports that PR builds are failing with 'Process completed with exit code 1' on the 'Test' step.
09:05I check the log—last line is 'Error: no test specified' but we have Jest configured.
09:10I re-run the workflow with debug logging enabled.
09:15Debug log shows npm install succeeded but test script is missing from package.json.
09:20I compare package.json from main branch vs PR branch—PR branch has an older version without scripts.
09:25Check git history: developer force-pushed an outdated branch that overwrote latest package.json.
09:30Ask developer to rebase on main and re-push.
09:35Workflow passes.

Monday morning, the team's Slack started blowing up. Every pull request opened in the last hour was failing on the 'Test' step with that generic 'exit code 1'. The log showed 'No test specified', which made no sense—we had Jest, we had a test script. I've been burned by this before: the error message is GitHub's way of saying 'something failed', not the actual reason.

I clicked the failed job and expanded the step. The last line said 'Error: no test specified', but the npm install step above it looked fine. I re-ran with ACTIONS_RUNNER_DEBUG=true. The debug log revealed that npm install ran, but then it output the help text for 'npm test' because the test script was missing. How could it be missing? I checked the PR branch's package.json—it was an old version that didn't have the scripts section. The developer had force-pushed an outdated branch, overwriting the latest changes.

The fix was simple: rebase on main and push again. But the lesson was nasty: GitHub Actions doesn't show you the full context. The error message was misleading because npm test failed silently due to missing script. Now we added a step to validate package.json before running tests, and we enforced branch protection to prevent force-pushes to shared branches. Also, we pinned our runner image to ubuntu-22.04 to avoid environment surprises.

Root cause

Developer force-pushed an outdated branch that replaced the package.json with a version missing the test script, causing npm test to fail.

The fix

Rebase the PR branch on main to restore the correct package.json, then re-push. Added a pre-test validation step to catch missing scripts early.

The lesson

Generic exit code errors in GitHub Actions often mask the real issue. Enable debug logging early and always compare the current state of key files (like package.json) with the base branch. Also, avoid force-pushing to branches used by multiple developers.

( 08 )Reading GitHub Actions Logs Like a Pro

The first thing to do when a workflow fails is to look at the raw log output. Click the failed step and scroll to the very bottom. The last 20 lines usually contain the actual error. But GitHub sometimes truncates long lines or hides them behind a 'View raw log' link. Always click that link to see the full output.

Another trick: use the search function (Ctrl+F) in the log page to find keywords like 'error', 'exception', 'fail', or 'exit code'. This often jumps you straight to the problem. Also, look for annotations—GitHub adds small red or yellow icons next to lines that triggered warnings or errors. Hover over them for details.

( 09 )Why Your Workflow Passes Locally but Fails on GitHub

This is the most frustrating category of failures. The root cause is almost always an environment difference. Your local machine might have Node.js 18, but the runner uses Node.js 16. Or you have a global npm package installed locally that the runner doesn't have. The fix is to explicitly specify the runner OS and tools using actions like `actions/setup-node@v3` with a specific version.

Another common culprit is file path casing. Windows runners are case-insensitive, but Linux runners are case-sensitive. A require('./File') that works on Windows might fail on Ubuntu because the actual file is named 'file.js'. Use a linter to catch casing issues. Also, check line endings—Git can convert CRLF to LF and break scripts. Add a `.gitattributes` file to enforce consistent line endings.

( 10 )Debugging Self-Hosted Runner Issues

If your workflow hangs in 'queued' or fails with 'runner is offline', your self-hosted runner might be down. First, check the runner service status on the machine: `sudo systemctl status actions.runner.*` or check the log file at `/var/log/runner/runner.log`. Common issues include disk space full, network proxy misconfiguration, or the runner process exited due to OOM.

To test connectivity, from the runner machine run `curl -s https://api.github.com/repos/<owner>/<repo>/actions/runners | jq '.runners[] | {name, status}'`. This shows if the runner is registered and online. If the runner is stuck in 'offline', restart the runner service. If it fails to start, re-configure the runner by running `./config.sh remove` and then `./config.sh` again with a new token from repo settings.

( 11 )Caching Pitfalls in GitHub Actions

Caching can speed up workflows but also introduce hard-to-debug failures. The most common issue is using a cache key that's too broad, causing old cache to be reused when dependencies change. For npm, use `key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}`. This ensures cache is invalidated when the lock file changes.

If you suspect a cache problem, the simplest fix is to clear the cache manually. Go to your repository's 'Actions' tab, then 'Caches' (under Management), and delete the relevant cache entry. Re-run the workflow without cache to see if it passes. Also, check that the `actions/cache` action is using the correct `path`—common mistakes include caching node_modules but not the lock file.

( 12 )Workflow Permissions: The Silent Killer

Many workflows fail because they lack the required permissions. GitHub recently changed the default permissions to be more restrictive. If your workflow needs to push to a branch, create a release, or access a container registry, you must explicitly set permissions at the workflow level or per job. For example, `permissions: contents: write` allows pushing commits.

Another common issue is accessing secrets in workflows triggered by pull requests from forks. By default, secrets are not available to fork PRs for security reasons. If you need them, you can use the `pull_request_target` event, but that comes with its own security risks. Alternatively, use OpenID Connect to authenticate without secrets. To debug permission issues, add a step that runs `echo ${{ secrets }}` (but be careful not to print secrets in logs—use a dummy variable).

Frequently asked questions

How do I re-run a failed GitHub Actions workflow with debug logging?

You can't re-run with debug logging directly from the UI. Instead, set two repository secrets: ACTIONS_RUNNER_DEBUG (set to 'true') and ACTIONS_STEP_DEBUG (set to 'true'). Then re-run the workflow from the Actions tab. The next run will include verbose logs. After debugging, remember to delete these secrets to avoid excessive logging.

Why does my workflow say 'Stuck in queue' for hours?

This usually means all runners are busy. For GitHub-hosted runners, there's a concurrency limit (typically 20-50 depending on your plan). For self-hosted runners, check if the runner is offline or has reached its job limit. You can also check the GitHub status page for any ongoing incidents. To mitigate, add more self-hosted runners or use a larger plan.

How do I test my GitHub Actions workflow locally?

Use the `act` tool by nektos. Install it via brew (macOS) or download the binary. Run `act -l` to list workflows, then `act -j <job-name>` to run a specific job. Note that `act` uses Docker to simulate the runner environment, but it may differ from GitHub's actual runners. Always validate on GitHub after local testing.

What does 'Error: Process completed with exit code 1' mean?

It means a step in your workflow returned a non-zero exit code, indicating failure. But it doesn't tell you which command failed. Look at the lines immediately preceding this error in the log—they usually contain the actual error message. If the log is truncated, click 'View raw log' to see the full output.

How can I see which line of my YAML file has a syntax error?

In the GitHub Actions UI, when you edit the workflow file, there's a 'Validate' button that checks for syntax errors and highlights the line. Alternatively, use a YAML linter like `yamllint` or the `action-validator` tool. Run `action-validator .github/workflows/your-workflow.yml` to get precise error locations.

GitHub Actions Workflow Failing: Debugging Steps That Actually Work

What this usually means

Frequently asked questions