Skip to main content
Daniel Cohen·

The expand/contract migration pattern finally made our schema changes safe to roll back

Design robust rollback strategies with blue-green deployments, canary rollbacks, database backward compatibility, and data migration reversal.

Zero-Downtime Rollback Strategist

You are a deployment safety engineer specializing in zero-downtime rollback strategies. Design a bulletproof rollback plan. **Application Architecture:** {{app_architecture}} **Current Deployment Method:** {{deployment_method}} **Database Type:** {{database_type}} **Infrastructure:** {{infrastructure}} **Previous Incidents:** {{previous_incidents}} **Risk Tolerance:** {{risk_tolerance}} Design: 1. **Rollback Strategy Matrix**: Choose between blue-green, canary, rolling, or feature-flag rollback based on context 2. **Blue-Green Deployment Config**: Complete setup with traffic switching, health verification 3. **Canary Rollback Rules**: Automated rollback triggers based on error rate, latency, custom business metrics 4. **Database Compatibility**: Forward-compatible schema changes that allow rollback 5. **Migration Reversal**: Safe database downgrade scripts for each migration 6. **Data Consistency**: How to handle data written by new version during rollback 7. **Feature Flag Integration**: Using feature flags as first-line defense for risky changes 8. **Circuit Breaker Setup**: Automatic traffic shifting away from failing instances 9. **Rollback Verification**: Health checks to confirm rollback success before declaring incident resolved 10. **Communication Plan**: Notify stakeholders during planned vs emergency rollbacks 11. **Decision Matrix**: When to rollback vs hotfix vs forward-patch 12. **Automation**: Scripts and CI pipeline stages for one-click rollback Output architecture diagrams (in text), scripts, and configuration files.

Ergebnisse

Designed a zero-downtime rollback strategy for a Kubernetes app on Postgres. The core principle: every deploy must be rollback-safe, which means database changes are backward-compatible by construction. **Strategy matrix:** | Change type | Rollback method | |-------------|-----------------| | Stateless code | Rolling update — `kubectl rollout undo` | | Risky feature | Feature flag (instant off, no redeploy) | | Schema change | Expand/contract migration (never a destructive single step) | **Expand/contract is the key.** Never rename or drop a column in the same release that stops using it: ```sql -- Release N (expand): add the new column, keep the old one ALTER TABLE orders ADD COLUMN total_cents bigint; -- backfill in the background, dual-write from the app -- Release N+1 (contract): only after N is fully rolled out and stable ALTER TABLE orders DROP COLUMN total; -- safe now; nothing reads it ``` Because release N still writes both columns, rolling back to N-1 loses no data and the schema is still compatible. **Canary auto-rollback rule:** ```bash # shift 10% traffic, watch error rate, abort if it crosses 1% if [ "$(curl -s metrics | jq '.error_rate')" \> "0.01" ]; then kubectl rollout undo deployment/checkout fi ``` **Decision guide:** prefer feature-flag-off for app logic (seconds), `rollout undo` for code regressions (one command), and never roll a schema back — roll *forward* with a compensating migration. **Verification:** a post-rollback readiness probe must pass before the incident is declared resolved.

Modell: Claude Sonnet 4

49 Likes20 SavesScore: 27

2 Kommentare

Marco Rossi·

The Big-O note at the end sold me.

Ahmed Hassan·

Okay this debugging, testing output just saved me an afternoon.