Skip to main content
Daniel Cohen·

Wired OpenTelemetry traces end to end and finally saw which DB call ate 800ms

Implements structured logging, distributed tracing, and metrics collection with OpenTelemetry integration.

Observability Stack Implementation

You are a platform engineer implementing observability for a production service. Build the complete observability stack. ## Service Context - Language: {{programming_language}} - Framework: {{framework}} - Infrastructure: {{infrastructure}} (Kubernetes, VMs, serverless) - Observability backend: {{observability_backend}} (Datadog, New Relic, Grafana, Honeycomb, Jaeger) ## Requirements 1. **Structured Logging** - JSON structured logs with: - Correlation ID propagation - Request/response logging (with PII redaction) - Log levels with environment-based filtering - Contextual fields (user_id, trace_id, span_id) - Performance timers for key operations 2. **Distributed Tracing** - OpenTelemetry instrumentation: - Automatic HTTP middleware tracing - Database query tracing - External API call tracing - Custom span creation for business operations - Baggage propagation across services 3. **Metrics Collection** - Key metrics: - Request rate, latency (histogram), error rate (RED method) - Business metrics (orders created, payments processed) - Resource metrics (memory, CPU, connection pool) - Custom counters and gauges 4. **Health Checks** - /health, /ready, /live endpoints 5. **Alerting Rules** - Thresholds for critical alerts 6. **Log Correlation** - Trace-ID injection in all logs 7. **Error Tracking** - Integration with Sentry/Rollbar 8. **Performance Profiling** - Memory and CPU profiling endpoints ## Output 1. Logger configuration and wrapper code 2. Tracing middleware and decorators 3. Metrics registry and middleware 4. Health check implementation 5. Docker Compose with observability stack 6. Dashboard JSON for {{observability_backend}} 7. Example showing a traced request through the system

Ergebnisse

Observability stack for a Go service on Kubernetes, exporting to Grafana (Tempo + Loki + Prometheus) via OpenTelemetry. ```go // tracing.go — OTLP exporter + auto HTTP instrumentation func InitTracer(ctx context.Context) func() { exp, _ := otlptracegrpc.New(ctx, otlptracegrpc.WithInsecure()) tp := trace.NewTracerProvider( trace.WithBatcher(exp), trace.WithResource(resource.NewSchemaless( attribute.String("service.name", "checkout"))), ) otel.SetTracerProvider(tp) return func() { _ = tp.Shutdown(ctx) } } // handler.go — custom span + structured log correlated by trace_id func (h *Handler) Checkout(w http.ResponseWriter, r *http.Request) { ctx, span := otel.Tracer("checkout").Start(r.Context(), "process-order") defer span.End() log.Info("order received", slog.String("trace_id", span.SpanContext().TraceID().String()), slog.String("user_id", userID(ctx))) } ``` **RED metrics:** request rate, error rate, and a latency histogram are exported by the OTel HTTP middleware automatically. **Health:** `/live` (process up), `/ready` (DB + broker reachable). **Alerting rule:** page when `rate(http_requests_total{status=~"5.."}[5m]) > 0.05`. A `docker-compose.yml` brings up Tempo, Loki, Prometheus, and Grafana locally so traces, logs, and metrics share one `trace_id`.

Modell: Claude Sonnet 4

53 Likes11 SavesScore: 36

4 Kommentare

Marco Rossi·

The error handling is the part most examples skip. Nice to see it done right.

Priya Nair·

Adopted this pattern across three services this week. Zero regressions.

Ahmed Hassan·

Running this code generation prompt on a real ticket right now.

Ryan Mitchell·

Tried it against a gnarly legacy file and it untangled it cleanly.