What this usually means
The Cloud Run container runtime imposes a hard 4-minute startup window: from the moment the container is scheduled to the moment it must be listening on the configured port. Most timeouts are not about actual code slowness but about resource contention (CPU throttle during cold start), health check misconfiguration (startup probe hitting a slow endpoint), or silent init failures (missing env vars, unresponsive dependencies). The error surfaces as 'Container failed to start' or 'Startup probe failed' because the platform kills the container if it doesn't pass the startup probe within the deadline. The trick is that the 4-minute limit includes image pull time, so large images or slow network can eat into your budget.
The first ten minutes — establish facts before touching code.
- 1Run `gcloud run revisions list --region=<region> --service=<service> --format='table(name, status, image)'` to see revision status. If status is 'Unknown', the startup failed.
- 2Check logs: `gcloud logging read 'resource.type=cloud_run_revision AND resource.labels.service_name=<service> AND severity>=ERROR' --limit=50 --format='table(timestamp, textPayload)'` for startup probe failures or container exit codes.
- 3Review container startup command: `gcloud run services describe <service> --region=<region> --format='value(spec.template.spec.containers[0].command)'`. If empty, Cloud Run uses the Dockerfile's ENTRYPOINT.
- 4Test locally with Docker: `docker run -p 8080:8080 <image>` and time how long before the app responds on port 8080. If > 240s, you'll timeout in Cloud Run.
- 5Enable Cloud Run startup CPU boost: `gcloud run services update <service> --region=<region> --cpu-boost` to avoid CPU throttling during cold start.
- 6Check container logs for startup messages: `gcloud logging read 'resource.type=cloud_run_revision AND resource.labels.service_name=<service> AND "Startup probe"' --limit=10`.
The specific files, logs, configs, and dashboards that usually own this bug.
- searchCloud Run UI → Revisions tab → click failed revision → 'Container' tab for exit code and failure reason
- searchCloud Logging: filter by `resource.type=cloud_run_revision AND resource.labels.service_name=<service>` and look for 'Startup probe' or 'Container startup' entries
- searchCloud Monitoring: Metrics Explorer → 'Cloud Run Revision' → 'Container Startup Latency' metric, look for values > 240s
- searchDockerfile: check for heavy COPY layers (e.g., `COPY . .`), multi-stage build inefficiencies, or missing .dockerignore
- searchSource code: look for blocking operations in app init (database connection, external API call, file parsing) that delay HTTP server startup
- searchStartup probe configuration: `gcloud run services describe <service> --region=<region> --format='value(spec.template.spec.containers[0].startupProbe)'` — if missing, probe is disabled
- searchResource limits: `gcloud run services describe <service> --region=<region> --format='value(spec.template.spec.containers[0].resources.limits)'` — default CPU is 1, memory 256M
Practical causes, not theory. These are the things you will actually find.
- warningCPU throttling during cold start: Cloud Run allocates CPU only when request is in-flight; without CPU boost, init code runs at 0.25 vCPU effectively
- warningLarge container image (>1GB) causing slow pull: Cloud Run's 4-minute limit includes image download time
- warningStartup probe hitting an endpoint that requires DB migration or cache warmup (e.g., /health that queries DB)
- warningServer not binding to 0.0.0.0:8080: common in frameworks that default to localhost only (e.g., Flask app.run())
- warningContainer exits silently due to missing environment variables or misconfigured entrypoint
- warningSlow dependency loading: Python importing large libraries (TensorFlow, PyTorch) or Java classpath scanning
- warningMemory limit too low causing OOM during startup: the container gets killed before completing init
Concrete fix directions. Pick the one that matches your root cause.
- buildEnable CPU boost: `gcloud run services update <service> --region=<region> --cpu-boost` — gives full CPU during cold start
- buildOptimize container image: use distroless base, multi-stage builds, `.dockerignore` to exclude node_modules, .git. Target image < 500MB.
- buildAdjust startup probe: set `initialDelaySeconds` to 30-60, use a lightweight endpoint (e.g., return 200 from memory, no DB call). Use `gcloud run services update --startup-probe`.
- buildDefer heavy initialization: load models or DB connections after the server starts listening (e.g., in background goroutine or async init).
- buildBind to all interfaces: in Go use `http.ListenAndServe(":8080", nil)`, in Python Flask `app.run(host='0.0.0.0', port=8080)`.
- buildIncrease memory limit: `gcloud run services update <service> --memory=512M` or 1G if your app needs it during init.
- buildSet explicit startup timeout via `gcloud run services update --timeout=300` (max 3600s) if you need more than 4 min for init (rare but possible).
A fix you cannot prove is a guess. Close the loop.
- verifiedDeploy a new revision: `gcloud run deploy <service> --image=<image> --region=<region>` and check status becomes 'Ready' within 4 minutes
- verifiedMonitor startup latency: `gcloud logging read 'resource.type=cloud_run_revision AND "startup"' --limit=5` and verify latency < 200s
- verifiedRun a health check endpoint test: `curl -I https://<service>.run.app/health` should return 200 within 5 seconds of cold start
- verifiedCheck revision logs for 'Container started' or 'Listening on port 8080' messages before any probe failures
- verifiedUse Cloud Monitoring alerting: set a metric alert on 'Container Startup Latency' > 200s for 1 minute to catch regressions
- verifiedSimulate cold start: scale to zero by waiting 15 minutes of inactivity, then hit the URL. Time the first response with `curl -w '%{time_total}'`.
Things that make this bug worse or harder to find.
- warningDon't assume the 4-minute timeout is about your app logic — check image pull time first by looking at 'Container Runtime Init' logs
- warningDon't set startup probe to hit a complex endpoint (e.g., /health that connects to DB) — use a simple /startup endpoint that returns 200 immediately
- warningDon't ignore CPU boost: without it, your container gets CPU throttled during init, making everything slower
- warningDon't overlook the port binding: Cloud Run expects your app to listen on PORT env var (default 8080). If you hardcode 3000, it will fail
- warningDon't put database migrations in the startup probe endpoint — they can take minutes; run them in a separate job or during deploy
- warningDon't use a huge container image because you think Cloud Run will 'just scale' — large images directly increase startup time
- warningDon't forget to check environment variables: missing env vars can cause the app to crash silently during init
The TensorFlow Cold Start Nightmare
Timeline
- 09:15Deployed new revision with TensorFlow model inference service
- 09:17Revision status shows 'Unknown' in Cloud Run UI
- 09:18Checked logs: 'Startup probe failed' and 'Container startup cancelled' after 240s
- 09:20Checked container logs: no output — app never started
- 09:22Ran docker run locally: app starts in 3 seconds, port 8080 responds
- 09:25Noticed local docker run used full CPU; Cloud Run throttles CPU without boost
- 09:28Enabled CPU boost: gcloud run services update --cpu-boost
- 09:30Deployed again: still fails after 240s
- 09:35Inspected Dockerfile: COPY . . included models directory (2GB)
- 09:40Optimized Dockerfile: multi-stage build, .dockerignore, distroless base (image shrunk to 300MB)
- 09:45Deployed: revision becomes 'Ready' in 90 seconds
I had just deployed a new Cloud Run service for real-time image classification using a TensorFlow model. The app was a simple Flask server that loaded the model at startup and exposed a predict endpoint. I built the image with a naive Dockerfile — just COPY . . and pip install from requirements.txt. The image was 2.3GB because it included the full models directory and unnecessary Python packages.
The first deploy failed silently. Revision status was 'Unknown', and the logs showed only 'Startup probe failed' after 4 minutes. There was no stack trace, no Python error — just the platform killing the container. I wasted 20 minutes thinking it was a code bug, running the container locally (which worked instantly) and adding debug prints.
The real issue was two-fold: first, the 2.3GB image took over 2 minutes to pull, eating half the startup budget. Second, even after the image was pulled, the TensorFlow initialization was slow because Cloud Run throttles CPU during cold start unless CPU boost is enabled. After I enabled CPU boost and optimized the Dockerfile to 300MB with a distroless base, the revision started in under 90 seconds. The lesson: always check image size first, and enable CPU boost for any service that does heavy initialization.
Root cause
Large container image (2.3GB) caused slow pull, combined with CPU throttling during cold start, preventing the Flask app from starting within the 4-minute timeout.
The fix
Enabled CPU boost and optimized Dockerfile using multi-stage build with distroless base, reducing image to 300MB. Also added a lightweight /startup endpoint for the startup probe.
The lesson
Cloud Run's 4-minute startup limit includes image pull time. Always optimize image size (<500MB) and enable CPU boost for services with heavy initialization. Test with a simple health endpoint that does no I/O.
Cloud Run gives your container exactly 240 seconds from the moment the infrastructure schedules the revision to the moment the startup probe receives a successful HTTP response. This window includes image pull, filesystem setup, container startup, and your application's initialization code. Many engineers assume the timeout only applies to code execution, but image pull time can consume 30-60% of the budget for large images.
You can monitor this breakdown in Cloud Logging by filtering for 'Container Runtime Init' events. These logs show the time spent pulling the image, setting up the container, and running the entrypoint. If you see a long gap between 'Pulling image' and 'Container started', your image is too large. If the gap is between 'Container started' and the first probe success, your init code is slow.
By default, Cloud Run allocates CPU only when a request is being processed. During cold start, no request is being processed (the startup probe is separate), so your container runs with a severely throttled CPU — roughly 0.25 vCPU regardless of the configured limit. This is the single most common cause of startup timeouts for compute-heavy init code like loading ML models, parsing large configs, or establishing database connections.
The fix is to enable CPU boost via `gcloud run services update --cpu-boost`. This provides full CPU allocation during the startup window. It costs a small amount extra (charged for the duration of cold start), but it's almost always worth it. Without it, even a simple Node.js app that does npm package resolution at startup can timeout if the image is large.
The startup probe is the gatekeeper: Cloud Run sends HTTP GET requests to the configured path (default /) every few seconds until it gets a 200-399 response or the 240-second deadline expires. If your probe endpoint is too slow (e.g., it queries a database or waits for migrations), it will fail even if your server is technically running. The probe's purpose is to check if the server is ready to serve traffic, but during startup, it should be a simple liveness check.
Best practice: create a dedicated /startup endpoint that returns 200 immediately (e.g., `return 'OK'` in Flask, `res.send('OK')` in Express). Set the startup probe to use this path with an `initialDelaySeconds` of 10-30 to give the server a head start. Avoid using the same endpoint for startup and readiness probes if the readiness one does heavy checks. You can configure it via `gcloud run services update --startup-probe`.
Every megabyte matters in Cloud Run's startup window. I've seen teams push 2GB images because they `COPY . .` without a `.dockerignore`. Use a `.dockerignore` to exclude node_modules, .git, large data files, and any build artifacts. For Python, use `pip install --no-cache-dir` and avoid installing dev dependencies. For Node.js, use `npm ci --only=production`.
Multi-stage builds are critical: compile your code in a builder image with full SDKs, then copy only the runtime artifacts to a slim base like `distroless` or `alpine`. For example, a Go app can be built in a `golang:1.20` image and copied to `scratch` — resulting in a 10MB image. For Python, use `python:3.9-slim` and remove `__pycache__`. Every second saved on image pull is a second for your init code.
Cloud Run's container logs are your first line of defense, but they can be silent if the app crashes before emitting anything. Always add a startup log message (e.g., `print('Starting server...')`) at the very beginning of your entrypoint. Then, in Cloud Logging, filter by `resource.type=cloud_run_revision` and search for that message. If you don't see it, the container never started — likely an image pull or entrypoint issue.
For monitoring, set up a dashboard with the 'Container Startup Latency' metric. If you see p99 latency creeping above 200 seconds, you're at risk of hitting the 240-second limit. Also monitor 'Container CPU Utilization' during cold start: if it's below 0.5 vCPU, you're likely throttled without CPU boost. Use Cloud Monitoring alerts to notify you when startup latency exceeds a threshold (e.g., 180 seconds) so you can investigate before a full timeout.
Frequently asked questions
What is the exact timeout for Cloud Run startup?
Cloud Run has a hard 4-minute (240-second) limit for container startup. This includes image pull time, filesystem setup, and your application's initialization until the startup probe succeeds. After that, the revision is marked as failed and traffic is not routed to it. You can increase the request timeout (up to 60 minutes) but not the startup timeout.
Does enabling CPU boost increase cost?
Yes, but only during cold start. Cloud Run bills for CPU boost at the same rate as allocated CPU (the vCPU count you set). Since cold starts are typically short (seconds to minutes) and infrequent (if you have traffic), the cost impact is minimal. For most services, it's a few cents per month. Enable it via `gcloud run services update --cpu-boost`.
My container starts fine locally but times out on Cloud Run. Why?
Local Docker runs typically have full CPU and no image pull time. On Cloud Run, the container is pulled from Container Registry, which can take 30 seconds to 2 minutes for large images. Additionally, Cloud Run throttles CPU during cold start unless CPU boost is enabled. Check image size and enable CPU boost. Also ensure your app listens on `0.0.0.0:${PORT}` (default 8080).
How do I check if my startup probe is configured correctly?
Run `gcloud run services describe <service> --region=<region> --format='value(spec.template.spec.containers[0].startupProbe)'`. If it returns null, the default probe (HTTP GET on PORT, path '/') is used. You can set a custom probe with `gcloud run services update --startup-probe`. The probe must return an HTTP 200-399 status within 240 seconds. Use a lightweight endpoint that does no I/O.
Can I increase the 4-minute startup timeout?
No, Cloud Run does not allow increasing the startup timeout. It is fixed at 240 seconds. However, you can reduce startup time by optimizing your container image, enabling CPU boost, and deferring heavy initialization. If you need longer startup, consider using Cloud Run's 'min-instances' feature to keep instances warm and avoid cold starts altogether.