Daniel Cohen·
Wired OpenTelemetry traces end to end and finally saw which DB call ate 800ms
Implements structured logging, distributed tracing, and metrics collection with OpenTelemetry integration.
Observability Stack Implementation
You are a platform engineer implementing observability for a production service. Build the complete observability stack.
## Service Context
- Language: {{programming_language}}
- Framework: {{framework}}
- Infrastructure: {{infrastructure}} (Kubernetes, VMs, serverless)
- Observability backend: {{observability_backend}} (Datadog, New Relic, Grafana, Honeycomb, Jaeger)
## Requirements
1. **Structured Logging** - JSON structured logs with:
- Correlation ID propagation
- Request/response logging (with PII redaction)
- Log levels with environment-based filtering
- Contextual fields (user_id, trace_id, span_id)
- Performance timers for key operations
2. **Distributed Tracing** - OpenTelemetry instrumentation:
- Automatic HTTP middleware tracing
- Database query tracing
- External API call tracing
- Custom span creation for business operations
- Baggage propagation across services
3. **Metrics Collection** - Key metrics:
- Request rate, latency (histogram), error rate (RED method)
- Business metrics (orders created, payments processed)
- Resource metrics (memory, CPU, connection pool)
- Custom counters and gauges
4. **Health Checks** - /health, /ready, /live endpoints
5. **Alerting Rules** - Thresholds for critical alerts
6. **Log Correlation** - Trace-ID injection in all logs
7. **Error Tracking** - Integration with Sentry/Rollbar
8. **Performance Profiling** - Memory and CPU profiling endpoints
## Output
1. Logger configuration and wrapper code
2. Tracing middleware and decorators
3. Metrics registry and middleware
4. Health check implementation
5. Docker Compose with observability stack
6. Dashboard JSON for {{observability_backend}}
7. Example showing a traced request through the system
Ergebnisse
Observability stack for a Go service on Kubernetes, exporting to Grafana (Tempo + Loki + Prometheus) via OpenTelemetry.
```go
// tracing.go — OTLP exporter + auto HTTP instrumentation
func InitTracer(ctx context.Context) func() {
exp, _ := otlptracegrpc.New(ctx, otlptracegrpc.WithInsecure())
tp := trace.NewTracerProvider(
trace.WithBatcher(exp),
trace.WithResource(resource.NewSchemaless(
attribute.String("service.name", "checkout"))),
)
otel.SetTracerProvider(tp)
return func() { _ = tp.Shutdown(ctx) }
}
// handler.go — custom span + structured log correlated by trace_id
func (h *Handler) Checkout(w http.ResponseWriter, r *http.Request) {
ctx, span := otel.Tracer("checkout").Start(r.Context(), "process-order")
defer span.End()
log.Info("order received",
slog.String("trace_id", span.SpanContext().TraceID().String()),
slog.String("user_id", userID(ctx)))
}
```
**RED metrics:** request rate, error rate, and a latency histogram are exported by the OTel HTTP middleware automatically. **Health:** `/live` (process up), `/ready` (DB + broker reachable). **Alerting rule:** page when `rate(http_requests_total{status=~"5.."}[5m]) > 0.05`. A `docker-compose.yml` brings up Tempo, Loki, Prometheus, and Grafana locally so traces, logs, and metrics share one `trace_id`.
Modell: Claude Sonnet 4
53 Likes11 SavesScore: 36
4 Kommentare
Marco Rossi·
The error handling is the part most examples skip. Nice to see it done right.
Priya Nair·
Adopted this pattern across three services this week. Zero regressions.
Ahmed Hassan·
Running this code generation prompt on a real ticket right now.
Ryan Mitchell·
Tried it against a gnarly legacy file and it untangled it cleanly.