LEARN · DEBUGGING GUIDE

OpenTelemetry Metrics Not Exporting: A Field Guide

When your OpenTelemetry metrics silently die before reaching the backend, it's usually a configuration or lifecycle issue. Here's exactly how to find the break.

IntermediateObservability7 min read

What this usually means

The most common root cause is a mismatch between the SDK's metric export protocol and the collector's receiver configuration, or a lifecycle bug where the SDK shuts down before exporting its final batch. Other frequent causes include missing metric readers, incorrect temporality settings, or the collector dropping metrics due to attribute limits or unsupported data types. Silent failures are the norm—neither the SDK nor the collector raises obvious errors because the metric pipeline is designed to be best-effort.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

  • 1Run `curl -s http://localhost:8888/metrics | grep otelcol_exporter_sent_metric_points` on the collector to see if it thinks it's sending any metrics.
  • 2Check the SDK's metric export interval and temporality: `OTEL_METRIC_EXPORT_INTERVAL` (default 60s) vs. collector's `timeout` setting.
  • 3Enable verbose logging on the exporter: set `OTEL_LOG_LEVEL=debug` in the SDK and `--log-level=debug` on the collector, then look for 'Exporter payload' lines.
  • 4Verify the collector receiver is configured for the correct protocol (gRPC vs. HTTP) and endpoint: check `config.yaml` receivers section for `otlp:` or `otlphttp:`.
  • 5Send a test metric using `otel-cli` or a simple curl: `curl -X POST -H "Content-Type: application/x-protobuf" --data-binary @test.pb http://localhost:4318/v1/metrics` and check collector logs.
  • 6Inspect the SDK metric reader: for push-based exporters (OTLP), ensure a `PeriodicReader` is attached to the `MeterProvider`—many SDKs default to no reader.
( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

  • searchCollector config file: `/etc/otelcol/config.yaml` — receivers, exporters, and pipelines section
  • searchCollector self-metrics endpoint: `http://localhost:8888/metrics` — check `otelcol_exporter_sent_metric_points` and `otelcol_receiver_refused_metric_points`
  • searchSDK environment variables: `OTEL_EXPORTER_OTLP_ENDPOINT`, `OTEL_METRIC_EXPORT_INTERVAL`, `OTEL_EXPORTER_OTLP_COMPRESSION`
  • searchApplication logs with debug/trace level: look for 'Exporter payload' or 'Export batch' messages
  • searchCollector logs (journalctl or file): `journalctl -u otelcol -f` or `/var/log/otelcol/otelcol.log`
  • searchBackend ingestion API logs (e.g., Datadog intake, New Relic): check for rejected payloads or invalid metric names
( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

  • warningMissing `PeriodicReader` in SDK — the MeterProvider was configured without a reader, so metrics are never collected or exported.
  • warningCollector receiver protocol mismatch — SDK sends OTLP via gRPC but collector only has `otlphttp` receiver (or vice versa).
  • warningTemporality conflict — SDK defaults to Delta temporality but collector or backend expects Cumulative (or unsupported delta for certain metrics).
  • warningAttribute cardinality explosion — collector drops metrics exceeding `otlp.max_metric_attribute_value_size` or `otlp.max_metric_attribute_count`.
  • warningSDK shutdown before export — application exits without calling `MeterProvider.Shutdown()`, causing the final batch to be lost.
  • warningNetwork or TLS issues — endpoint unreachable, certificate mismatch, or proxy blocks the gRPC/HTTP connection (often silent in SDK).
( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

  • buildAdd a `PeriodicReader` to the `MeterProvider` in the SDK: e.g., in Python: `MeterProvider(metric_readers=[PeriodicReader(OtlpMetricExporter())])`.
  • buildAlign protocols: ensure collector has both `otlp` (gRPC) and `otlphttp` receivers, or match SDK exporter type to receiver.
  • buildSet `OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE='cumulative'` in SDK env to match backend expectations.
  • buildIncrease attribute limits in collector config: `otlp: { max_metric_attribute_value_size: 4096, max_metric_attribute_count: 128 }`.
  • buildAdd a graceful shutdown hook: register `MeterProvider.Shutdown()` on SIGTERM/SIGINT (e.g., Python `atexit` or `signal` handler).
  • buildTest connectivity with a simple `grpcurl` or `curl` to the collector endpoint before assuming the SDK is at fault.
( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

  • verifiedAfter fix, run a test app and check collector self-metrics: `otelcol_exporter_sent_metric_points` should increase.
  • verifiedQuery the backend directly: e.g., for Prometheus, `curl 'http://prometheus:9090/api/v1/query?query=up'` and look for your service.
  • verifiedEnable debug logging temporarily and confirm 'Exporter payload' lines appear in SDK logs.
  • verifiedUse `otel-cli metric send --name test_metric --value 42` to verify end-to-end pipeline.
  • verifiedCheck collector logs for 'Metrics exported' or 'Exporting metrics. Try 0.' with non-zero count.
( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

  • warningSetting `OTEL_METRIC_EXPORT_INTERVAL` too low (e.g., < 1s) — can cause backpressure and dropped batches.
  • warningForgetting to update the collector config after adding a new receiver — restart the collector service.
  • warningAssuming the SDK exports automatically without a reader — many SDKs require explicit `PeriodicReader`.
  • warningIgnoring collector self-metrics because they show zero — that's exactly the signal you need.
  • warningUsing Delta temporality with backends that only support Cumulative (e.g., Prometheus) — metrics will be lost after each scrape interval.
  • warningDebugging without first isolating the pipeline: test with a simple curl or otel-cli before blaming the application SDK.
( 07 )War story

The Midnight Metric Blackout

Senior Platform EngineerOpenTelemetry Collector v0.108.0, Python SDK v1.27.0, Prometheus v2.53.0

Timeline

  1. 00:15PagerDuty alert: 'No data for service checkout-api for 10 minutes'.
  2. 00:20Check Grafana — all checkout-api metrics show 'No data'. Other services fine.
  3. 00:25SSH to checkout-api host; curl collector self-metrics: 'otelcol_exporter_sent_metric_points 0'.
  4. 00:30Check collector logs — no errors, but 'Metrics exported' lines missing.
  5. 00:35Check collector receivers config — only 'otlphttp' defined. Application uses gRPC exporter.
  6. 00:40Add 'otlp' (gRPC) receiver to collector config. Restart collector.
  7. 00:45Self-metrics show sent_metric_points increasing. Grafana data returns.
  8. 00:50Root cause: receiver mismatch. Application SDK was updated to use gRPC last week but collector config wasn't updated.

The first thing I noticed was that only one service was affected. That immediately ruled out a global collector or backend issue. I checked the collector self-metrics and saw zero exported metric points. The collector was alive but not forwarding anything from that service.

I tailed the collector logs with `journalctl -u otelcol -f` and saw no errors, but also no lines that said 'Metrics exported' — just the periodic health checks. That's when I checked the receivers block in the collector config. It only had `otlphttp` listening on port 4318, but our Python SDK had been updated to use the default OTLP gRPC exporter (port 4317) a week earlier.

The fix took five minutes: I added `otlp:` under receivers and added it to the metrics pipeline in the config, then restarted the collector. Metrics started flowing within 30 seconds. The lesson: always align SDK exporter protocol with collector receiver protocol, and treat the collector config as a living document that must be updated alongside SDK changes.

Root cause

Mismatch between SDK exporter (gRPC) and collector receiver (HTTP only).

The fix

Added `otlp` (gRPC) receiver to collector config and restarted the service.

The lesson

Never assume collector config is static — every SDK update that changes exporter type or protocol must be mirrored in the collector configuration.

( 08 )SDK Metric Reader Lifecycle

The OpenTelemetry SDK does not automatically export metrics. You must attach a `MetricReader` to the `MeterProvider`. For OTLP exporters, use a `PeriodicReader` that calls the exporter at a fixed interval. Without a reader, the SDK collects metrics in memory but never sends them. This is the single most common oversight in metric export failures.

To verify, inspect the MeterProvider initialization in your application. In Python, look for `MeterProvider(metric_readers=[...])`. In Go, check that `NewMeterProvider` is called with a reader. Many SDKs provide a default reader only if you set environment variables like `OTEL_METRICS_EXPORTER=otlp`, but this can be overridden by code. Always log the MeterProvider configuration at startup.

( 09 )Temporality and Aggregation Mismatches

Temporality defines whether metric points are cumulative (reset only on restart) or delta (reset after each export). The SDK defaults to delta for OTLP, but Prometheus expects cumulative. If your collector or backend expects cumulative, you'll see metrics on the first export but then nothing until the next restart. The fix: set `OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=cumulative` in the SDK environment.

Additionally, check the aggregation temporality preference in the collector's `otlp` receiver. Some backends reject delta temporality for certain metric types (e.g., histograms). Use the `temporality_preference` setting in the receiver config to override, or ensure your SDK matches the backend requirements.

( 10 )Collector Attribute and Data Limits

The collector silently drops metrics that violate attribute constraints. Default limits: max attribute value size is 256 bytes, max number of attributes is 128. If your application attaches large or numerous attributes, metrics will be rejected without any log message unless you enable debug logging. Check the `otelcol_receiver_refused_metric_points` metric — if it's > 0, you're hitting limits.

Increase limits in the `otlp` receiver config: `max_metric_attribute_value_size: 4096` and `max_metric_attribute_count: 512`. Also check `max_metric_data_point_count` if you send many data points per export. These settings are per-receiver and require a collector restart.

( 11 )Network and TLS Silent Failures

The OTLP exporter often fails silently on network issues. The SDK retries internally but may exhaust retries without logging an error. Use a network tool like `grpcurl` to test connectivity: `grpcurl -plaintext collector:4317 grpc.health.v1.Health/Check`. If that fails, check firewalls, DNS, and TLS certificates.

For TLS, ensure the SDK trusts the collector's certificate. Set `OTEL_EXPORTER_OTLP_CERTIFICATE` to the CA cert path. In Kubernetes, verify that the service endpoint is correct and that network policies allow egress from the pod to the collector. The collector's `--log-level=debug` will show TLS handshake errors if present.

( 12 )Shutdown Race Conditions

When your application exits, the SDK must flush and export the last batch of metrics. If you don't call `MeterProvider.Shutdown()`, the final data is lost. This is especially common in serverless functions or short-lived batch jobs. The fix: register a shutdown hook that calls `MeterProvider.Shutdown()` with a timeout (e.g., 5 seconds).

In Python, use `atexit.register(meter_provider.shutdown)` or a context manager. In Go, use `defer meterProvider.Shutdown(context.Background())`. In Java, add a shutdown hook via `Runtime.getRuntime().addShutdownHook()`. Test by sending a SIGTERM and verifying the final batch arrives.

Frequently asked questions

Why does my collector show zero metrics sent even though my application logs say 'metrics exported'?

The SDK's 'metrics exported' log only means the exporter attempted to send, not that the collector received them. Check the collector's receiver logs (debug mode) to see if the payload arrived. Most often, the issue is a network problem or a protocol mismatch (gRPC vs HTTP). Also verify the collector's pipeline includes the correct receiver and exporter.

My metrics appear in the collector but not in the backend. What should I check?

First, check the collector's exporter self-metrics: `otelcol_exporter_sent_metric_points` vs `otelcol_exporter_enqueue_failed_metric_points`. If enqueue fails, the backend may be rejecting the payload due to invalid data or authentication. Enable debug logging on the exporter (e.g., `exporters: { otlp: { sending_queue: { enabled: true } } }`) to see the actual error. Also verify the backend endpoint URL and API key.

Do I need a MetricReader for every language SDK?

Yes, all OTel SDKs require an explicit MetricReader to be attached to the MeterProvider for push-based exporters (like OTLP). Some SDKs have a default reader if you set environment variables, but relying on that is risky. Always initialize the MeterProvider with a PeriodicReader in code. For pull-based exporters (like Prometheus), you need a different reader (e.g., PrometheusExporter).

Why do my metrics stop exporting after the first batch?

This is a classic temporality mismatch. If your SDK uses delta temporality and your backend expects cumulative, the first batch is accepted, but subsequent batches are rejected because they don't carry the full cumulative value. Set `OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=cumulative` in the SDK environment or configure the collector to convert temporality.

The collector logs show no errors, but metrics aren't reaching the backend. What's happening?

Silent drops are common. Check the collector's self-metrics for `otelcol_exporter_send_failed_metric_points` and `otelcol_receiver_refused_metric_points`. Also inspect the collector's internal telemetry: `curl http://localhost:8888/metrics | grep otelcol_`. If all metrics are zero, the pipeline might not be correctly configured — verify that the receiver and exporter are connected in the pipelines section of the config.