Structured Logging JSON Format Guide: Field Schema & Best Practices

Structured logging in JSON sounds simple: instead of writing a string like 'User 123 logged in', you write a JSON object like {"event": "login", "user_id": 123}. But once you run that in production for a week, you hit a dozen edge cases that your initial schema didn't account for. I've seen teams burn hours on dashboards that silently stopped working because a field name changed or a nested object went missing.

This guide covers the concrete decisions you need to make — field naming conventions, schema evolution, trace context, error representation, and what to do when your log pipeline chokes on malformed JSON. I'll also share a story from my own team where a missing field cost us three hours of debugging a production incident.

The Minimum Viable Log Schema

Before you add any domain-specific fields, every log line should have at least these four fields:

arrow_righttimestamp — RFC 3339 format (e.g., 2025-03-15T14:30:00.123Z). Use microseconds or nanoseconds if your system needs sub-millisecond resolution.
arrow_rightlevel — one of: DEBUG, INFO, WARN, ERROR, FATAL. Use strings, not numbers. Strings are human-readable and query engines can sort them.
arrow_rightmessage — a human-readable summary of the event. This is the fallback when a developer greps logs without a query.
arrow_rightservice — the name of the service that emitted the log. This is critical when you aggregate logs from multiple microservices.

info

I also recommend adding a version field (e.g., "log_schema_version": 1) from day one. It costs a few bytes per log line and saves you when you need to migrate to a new schema later.

Field Naming Conventions That Survive Production

The most common argument I see is camelCase vs snake_case. I've worked with both, and snake_case wins because: (1) most log aggregation tools treat field names case-insensitively, but snake_case is more readable in queries; (2) when you export logs to a data warehouse like BigQuery, column names are case-insensitive but snake_case is the convention; (3) it's consistent with the rest of the observability ecosystem (Prometheus metrics use snake_case).

Avoid dynamic keys. I once saw a system that logged event-specific data under the event name as a key: {"user_login": {"user_id": 123}}. This makes it impossible to write a query that aggregates across all events. Instead, use a flat structure with an event field: {"event": "user_login", "user_id": 123}.

Reserved Fields and Their Types

arrow_righttimestamp: string (RFC 3339)
arrow_rightlevel: string (one of DEBUG, INFO, WARN, ERROR, FATAL)
arrow_rightmessage: string
arrow_rightservice: string
arrow_righttrace_id: string (optional, but add if you use distributed tracing)
arrow_rightspan_id: string (optional)
arrow_righterror: object (optional, see below)
arrow_rightduration_ms: number (for request timing)
arrow_rightuser_id: string (if applicable)

Example of a well-structured JSON log line with trace context and error object.

{
  "timestamp": "2025-03-15T14:30:00.123Z",
  "level": "ERROR",
  "message": "Failed to connect to database",
  "service": "user-service",
  "trace_id": "abc123def456",
  "span_id": "span789",
  "error": {
    "message": "connection refused",
    "type": "ConnectionError",
    "stack": [
      "at Socket._onError (net.js:689:5)",
      "at emitErrorNT (internal/streams/destroy.js:106:8)"
    ]
  },
  "duration_ms": 2047,
  "user_id": "u_42"
}

The War Story: A Missing Field That Broke Our Dashboard

The Case of the Silent Dashboard

14:00Deploy of payment-service v2.3.0 to staging
14:15QA reports that the 'failed payments' dashboard shows zero errors, even though they triggered a failing payment
14:20On-call engineer checks raw logs — errors are there, but the field name is 'error_message' instead of 'error.message'
14:30Team discovers that a library update changed the log field from nested object to flat string
14:45Hotfix deployed to restore the original field structure
15:00Dashboard back to normal, but incident cost 3 hours of engineering time

Lesson

A simple schema inconsistency — flat string vs nested object — made the dashboard silently return zero results. If we had a schema validation step in CI that checked the log format of test runs, we would have caught this before deployment.

We now run a simple script in our CI pipeline that sends a sample log line to a mock aggregator and verifies the JSON structure matches a schema file. It's saved us from at least four similar regressions since then.

Error Representation: Flat vs Nested

This is the most common design debate I see. Some teams flatten error fields: {"error_message": "...", "error_type": "...", "error_stack": "..."}. Others nest them: {"error": {"message": "...", "type": "...", "stack": [...]}}.

I strongly prefer nested. It groups related fields together, which makes queries like error.type:ConnectionError possible without prefixing every field with error_. And when you export logs to a structured store, nested objects can be cast to a STRUCT type, while flat fields require a separate table or view.

warning

If you use nested objects, ensure your log shipper (e.g., Fluentd, Logstash) supports deep nesting. Some shippers flatten all nested objects by default — you'll lose the structure. Check the configuration before you deploy.

Handling Sensitive Data: Redact at the Source

You should never log raw passwords, tokens, credit card numbers, or PII. But accidental logging happens. The safest approach is to redact at the source — in your application code — using a structured logging library that supports field-level redaction.

For example, in Python's structlog, you can define a processor that scrubs fields matching a pattern:

Furthermore, add a CI check that scans log output for regex patterns matching common sensitive data (e.g., API keys, email addresses) and fails the build if found.

Example of a structlog processor that redacts sensitive fields before JSON serialization.

import structlog

def redact_sensitive(logger, method_name, event_dict):
    sensitive_keys = ['password', 'token', 'secret', 'ssn']
    for key in sensitive_keys:
        if key in event_dict:
            event_dict[key] = '***REDACTED***'
    return event_dict

structlog.configure(
    processors=[
        redact_sensitive,
        structlog.processors.JSONRenderer()
    ]
)

logger = structlog.get_logger()
logger.info("user login", user_id=42, password="supersecret")

Schema Validation in CI

The incident I described above could have been prevented with a simple schema check. Here's what we do now:

1. Define a JSON Schema file in the repository (e.g., log_schema.json) that specifies required fields, types, and allowed values for level.

2. In CI, run the application with a special flag that logs a single line to stdout, then pipe that line through a schema validator (e.g., ajv for Node.js, jsonschema for Python).

3. If validation fails, the build fails. This catches missing fields, wrong types, and unexpected nesting.

A minimal JSON Schema for log validation. Add more fields as your schema matures.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "required": ["timestamp", "level", "message", "service"],
  "properties": {
    "timestamp": { "type": "string", "pattern": "^\\d{4}-\\d{2}-\\d{2}T" },
    "level": { "type": "string", "enum": ["DEBUG", "INFO", "WARN", "ERROR", "FATAL"] },
    "message": { "type": "string" },
    "service": { "type": "string" },
    "trace_id": { "type": "string" },
    "error": {
      "type": "object",
      "properties": {
        "message": { "type": "string" },
        "type": { "type": "string" },
        "stack": { "type": "array", "items": { "type": "string" } }
      },
      "required": ["message", "type"]
    }
  }
}

What About Multi-Line Logs and Exceptions?

One JSON object per line is the standard (JSON Lines format). But what about stack traces that span multiple lines? If you inline them as a string with escaped newlines, you lose readability. I recommend storing the stack trace as an array of strings, each representing one line. This keeps each log line as a single JSON object and makes it easy to display the stack trace in a UI with proper formatting.

Example: "stack": ["Error: something broke", " at Object.<anonymous> (file.js:10:5)", ...].

A Note on Timestamp Precision

If your service processes thousands of requests per second, millisecond precision might not be enough. I've seen logs from the same request with identical timestamps because the clock resolution was too coarse. Use microseconds (six digits after seconds) or nanoseconds if your runtime supports it. In Go, use time.RFC3339Nano. In Node.js, use new Date().toISOString() which gives milliseconds — not enough. Consider a library like 'microtime' or format with a custom function.

Final Recommendations

1Start with the minimal schema and add fields only when you have a query that needs them.
2Enforce snake_case, consistent types, and required fields via a schema registry or CI check.
3Use nested objects for errors and other logical groups, but verify your log shipper doesn't flatten them.
4Add trace_id and span_id to every log line if you use distributed tracing — it makes debugging journeys across services possible.
5Redact sensitive data at the source, not in the log pipeline. Assume every log could be leaked.
6Validate your log format in CI. It takes 10 minutes to set up and saves hours of debugging.

47%

of teams using structured logging report schema drift as their top pain point (source: internal survey, 2024)

Structured logging in JSON is not just about machine readability — it's about building a reliable observability pipeline. The decisions you make today (field names, nesting, types) will either make your future debugging effortless or painful. I've seen both sides, and I strongly recommend investing in a schema upfront. Your future self (and your on-call team) will thank you.

Frequently asked questions

What is the difference between structured logging and unstructured logging?

Unstructured logging is plain-text messages like 'User 123 logged in'. Structured logging outputs key-value pairs or JSON, e.g., {"event": "login", "user_id": 123, "timestamp": "..."}. Structured logs are machine-parseable, queryable, and much easier to aggregate and alert on.

Should I use camelCase or snake_case for JSON log field names?

snake_case. Most log aggregation tools (Elasticsearch, Loki, BigQuery) treat field names case-insensitively or have better support for snake_case. More importantly, your query language (e.g., Lucene, LogQL) will be cleaner when field names are consistent across all services.

How do I handle errors and stack traces in structured logs?

Include a dedicated 'error' object with fields: message, type, stack (array of strings), and code if applicable. Avoid inlining the stack trace into the main message. Example: {"error": {"message": "connection refused", "type": "ConnectionError", "stack": ["at Socket._onError (net.js:...)"]}}.

What is the most common mistake teams make when adopting structured logging?

Not enforcing a schema. Teams start with ad-hoc fields, then after six months they have 50 different field names for the same concept (e.g., 'user_id', 'userId', 'uid', 'customerId'). This makes dashboards unreliable and queries painful. Use a schema registry or a shared logger configuration across services.

Structured Logging in JSON: Fields, Schemas, and Pitfalls from Production