Skip to main content
Lena Fischer·

Talked me out of one DB into Postgres plus ClickHouse with a sane pool-sizing formula and partitioning

Recommends the optimal database stack with sharding and partitioning strategies based on your data model, query patterns, and scale targets.

Database Selection & Sharding Strategy

You are a Database Architect specializing in high-scale distributed data systems. Help me select and design the database architecture for my system. **Data Profile**: {{data_profile}} (data types, sizes, growth rate, retention requirements) **Query Patterns**: {{query_patterns}} (read-heavy vs write-heavy, OLTP vs OLAP, latency SLAs) **Scale Targets**: {{scale_targets}} (records count, QPS, concurrent connections) **Current Pain Points**: {{current_pain_points}} (performance issues, bottlenecks, outages) Provide a comprehensive analysis: 1. **Database Selection Matrix** - Compare at least 5 database options (SQL, NoSQL, NewSQL, Specialized) with scoring across: consistency, availability, partition tolerance, query flexibility, operational complexity, cost at scale 2. **Polyglot Persistence Strategy** - If multiple databases are recommended, define the data split and synchronization strategy 3. **Sharding Strategy** - Shard key selection, sharding algorithm (hash, range, geo), resharding plan for growth 4. **Partitioning Design** - Table partitioning scheme with partition keys and pruning strategy 5. **Replication Topology** - Master-slave, multi-master, or quorum-based replication with RPO/RTO targets 6. **Indexing Strategy** - Primary, secondary, composite, and specialized indexes (GIN, full-text, geospatial) 7. **Connection Pooling** - Pool sizing formula based on (core_count * 2) + effective_spindle_count, with middleware recommendations 8. **Backup & Recovery** - Snapshot strategy, point-in-time recovery, cross-region replication 9. **Migration Plan** - Zero-downtime migration from current database to new architecture using dual-write or CDC 10. **Capacity Planning** - Storage growth projection for 12, 24, and 36 months with hardware/node requirements Include concrete configuration examples (e.g., PostgreSQL postgresql.conf tuning parameters).

Ergebnisse

## Database Strategy — high-write analytics + transactional commerce The workload is split: OLTP orders (strong consistency) and a firehose of clickstream events (write-heavy, eventual consistency is fine). One database can't be great at both — recommend polyglot persistence. **Selection matrix:** | Store | Use | Consistency | Why | |-------|-----|-------------|-----| | PostgreSQL 16 | orders, users, inventory | strong (ACID) | transactions + relational integrity for money | | ClickHouse | clickstream, analytics | eventual | columnar, ingests millions of rows/s, fast aggregates | | Redis | sessions, hot cache | n/a | sub-ms reads, offloads Postgres | **Sharding (clickstream in ClickHouse):** shard by `hash(user_id)` so a single user's events colocate, and partition by `toYYYYMM(event_time)` so old partitions drop cheaply for retention. ```sql -- ClickHouse events table: partitioned by month, ordered for time-range scans CREATE TABLE events ( user_id UInt64, event_type LowCardinality(String), event_time DateTime, properties String ) ENGINE = MergeTree PARTITION BY toYYYYMM(event_time) ORDER BY (user_id, event_time); ``` **Postgres replication:** one primary + two read replicas; route analytics-ish reads to replicas. RPO target ~5s via streaming replication, RTO ~30s via automated failover. **Connection pool sizing:** with 8 cores, start at `(8 * 2) + effective_spindles ≈ 18–20` per app instance via PgBouncer in transaction mode — far below a naive "hundreds of connections" that would thrash Postgres. **Migration (zero-downtime):** dual-write orders to old and new schema, backfill historical rows with a batched job, then flip reads via a feature flag. **Capacity (12/24/36 mo):** at the stated growth, ClickHouse stays single-cluster well past 36 months; Postgres needs a read replica around month 12 and table partitioning on `orders` by `created_at` around month 24.

Modell: Claude Opus 4

36 Likes9 SavesScore: 28

1 Kommentar

Priya Nair·

Okay this system design output just saved me an afternoon.