Playbook

Debugging Observability Bugs

How to fix telemetry paths that hide failures, crash jobs, or create new production risk.

Pattern

Logging, metrics, or request context code becomes part of the failure path.

warningSymptoms

  • arrow_rightLogger throws without request context
  • arrow_rightMetrics backend slows under high cardinality
  • arrow_rightSensitive headers appear in error logs

searchWhere to look

  • arrow_rightLogger context builders
  • arrow_rightMetrics label construction
  • arrow_rightError serializers
  • arrow_rightRequest and job middleware

buildCommon fixes

  • arrow_rightMake telemetry non-fatal
  • arrow_rightBound metric labels
  • arrow_rightRedact before every serialization path
  • arrow_rightKeep request ids in logs/traces, not labels

Practice challenges

No ready practice incident is mapped yet.