Marco Rossi·
Fed it 22 minutes of JSON logs and it traced a cache OOM cascading into our DB pool
Analyze application logs to detect error patterns, anomaly clusters, performance degradation signatures, and generate actionable insights.
Log Analysis & Pattern Detector
Act as an SRE performing deep log analysis. Analyze the following application logs and extract actionable insights.
**Log Sample:**
```
{{log_sample}}
```
**Log Format:**
{{log_format}}
**Application Context:**
{{application_context}}
**Time Window:**
{{time_window}}
**Known Issues (if any):**
{{known_issues}}
Perform comprehensive analysis:
1. **Error Pattern Detection**: Group similar errors, identify top error types and frequencies
2. **Anomaly Clustering**: Find temporal clusters where errors spike
3. **Correlation Analysis**: Correlate errors with warnings/info logs that preceded them
4. **Root Cause Hypotheses**: Top 3 most likely root causes with confidence scores
5. **Performance Signatures**: Identify latency spikes, throughput drops, resource exhaustion patterns
6. **Affected Components**: Which services/endpoints/users are impacted
7. **Severity Assessment**: Classify each issue by severity (P0 critical to P3 minor)
8. **Recommended Actions**: Immediate fixes and long-term preventive measures
9. **Log Quality Assessment**: Identify noisy logs, missing context, structured logging improvements
10. **Alert Recommendations**: Suggest log-based alerts to catch these patterns automatically
Output as a structured incident report with timelines and prioritized action items.
Ergebnisse
## Incident Report — `checkout-api` error spike, 14:00–14:22 UTC
**Log format:** JSON (pino). **Window:** 22 min. Below is the top-line analysis from the sample.
**1. Error clustering** — three error types, one dominates:
| Error | Count | Share |
|-------|-------|-------|
| `ECONNREFUSED redis:6379` | 1,847 | 71% |
| `TimeoutError: query exceeded 5000ms` | 612 | 24% |
| `ValidationError: missing sku` | 134 | 5% |
**2. Temporal correlation:** the Redis refusals begin at 14:00:11, exactly 4 seconds after a `redis client reconnecting` warning. The Postgres timeouts start 90 seconds later — a classic secondary failure: cache down → every request falls through to the DB → connection pool saturates.
**3. Root-cause hypothesis (confidence 0.85):** Redis was OOM-killed (the last healthy log shows `used_memory growing, maxmemory 256mb`). The app has no cache-miss circuit breaker, so it stampeded Postgres.
**4. Immediate actions:**
- Raise Redis `maxmemory` and set `maxmemory-policy allkeys-lru`.
- Add a short-circuit: if Redis is down, serve a degraded response instead of hammering the DB.
**5. Alert recommendation:** page when `rate({app="checkout"} |= "ECONNREFUSED" [2m]) > 5`. The `ValidationError` noise (5%) is a client bug, not an outage — route it to a daily digest, not the pager. **Log-quality note:** none of these lines carry a `trace_id`, so cross-service correlation was manual. Add request-id propagation.
Modell: Claude Sonnet 4
7 Likes2 SavesScore: 5
1 Kommentar
Priya Nair·
This is the kind of prompt that pays for the subscription by itself.