Daniel Cohen·
The expand/contract migration pattern finally made our schema changes safe to roll back
Design robust rollback strategies with blue-green deployments, canary rollbacks, database backward compatibility, and data migration reversal.
Zero-Downtime Rollback Strategist
You are a deployment safety engineer specializing in zero-downtime rollback strategies. Design a bulletproof rollback plan.
**Application Architecture:**
{{app_architecture}}
**Current Deployment Method:**
{{deployment_method}}
**Database Type:**
{{database_type}}
**Infrastructure:**
{{infrastructure}}
**Previous Incidents:**
{{previous_incidents}}
**Risk Tolerance:**
{{risk_tolerance}}
Design:
1. **Rollback Strategy Matrix**: Choose between blue-green, canary, rolling, or feature-flag rollback based on context
2. **Blue-Green Deployment Config**: Complete setup with traffic switching, health verification
3. **Canary Rollback Rules**: Automated rollback triggers based on error rate, latency, custom business metrics
4. **Database Compatibility**: Forward-compatible schema changes that allow rollback
5. **Migration Reversal**: Safe database downgrade scripts for each migration
6. **Data Consistency**: How to handle data written by new version during rollback
7. **Feature Flag Integration**: Using feature flags as first-line defense for risky changes
8. **Circuit Breaker Setup**: Automatic traffic shifting away from failing instances
9. **Rollback Verification**: Health checks to confirm rollback success before declaring incident resolved
10. **Communication Plan**: Notify stakeholders during planned vs emergency rollbacks
11. **Decision Matrix**: When to rollback vs hotfix vs forward-patch
12. **Automation**: Scripts and CI pipeline stages for one-click rollback
Output architecture diagrams (in text), scripts, and configuration files.
Ergebnisse
Designed a zero-downtime rollback strategy for a Kubernetes app on Postgres. The core principle: every deploy must be rollback-safe, which means database changes are backward-compatible by construction.
**Strategy matrix:**
| Change type | Rollback method |
|-------------|-----------------|
| Stateless code | Rolling update — `kubectl rollout undo` |
| Risky feature | Feature flag (instant off, no redeploy) |
| Schema change | Expand/contract migration (never a destructive single step) |
**Expand/contract is the key.** Never rename or drop a column in the same release that stops using it:
```sql
-- Release N (expand): add the new column, keep the old one
ALTER TABLE orders ADD COLUMN total_cents bigint;
-- backfill in the background, dual-write from the app
-- Release N+1 (contract): only after N is fully rolled out and stable
ALTER TABLE orders DROP COLUMN total; -- safe now; nothing reads it
```
Because release N still writes both columns, rolling back to N-1 loses no data and the schema is still compatible.
**Canary auto-rollback rule:**
```bash
# shift 10% traffic, watch error rate, abort if it crosses 1%
if [ "$(curl -s metrics | jq '.error_rate')" \> "0.01" ]; then
kubectl rollout undo deployment/checkout
fi
```
**Decision guide:** prefer feature-flag-off for app logic (seconds), `rollout undo` for code regressions (one command), and never roll a schema back — roll *forward* with a compensating migration. **Verification:** a post-rollback readiness probe must pass before the incident is declared resolved.
Modell: Claude Sonnet 4
49 Likes20 SavesScore: 27
2 Kommentare
Marco Rossi·
The Big-O note at the end sold me.
Ahmed Hassan·
Okay this debugging, testing output just saved me an afternoon.