Lena Fischer·
Talked me out of one DB into Postgres plus ClickHouse with a sane pool-sizing formula and partitioning
Recommends the optimal database stack with sharding and partitioning strategies based on your data model, query patterns, and scale targets.
Database Selection & Sharding Strategy
You are a Database Architect specializing in high-scale distributed data systems. Help me select and design the database architecture for my system.
**Data Profile**: {{data_profile}} (data types, sizes, growth rate, retention requirements)
**Query Patterns**: {{query_patterns}} (read-heavy vs write-heavy, OLTP vs OLAP, latency SLAs)
**Scale Targets**: {{scale_targets}} (records count, QPS, concurrent connections)
**Current Pain Points**: {{current_pain_points}} (performance issues, bottlenecks, outages)
Provide a comprehensive analysis:
1. **Database Selection Matrix** - Compare at least 5 database options (SQL, NoSQL, NewSQL, Specialized) with scoring across: consistency, availability, partition tolerance, query flexibility, operational complexity, cost at scale
2. **Polyglot Persistence Strategy** - If multiple databases are recommended, define the data split and synchronization strategy
3. **Sharding Strategy** - Shard key selection, sharding algorithm (hash, range, geo), resharding plan for growth
4. **Partitioning Design** - Table partitioning scheme with partition keys and pruning strategy
5. **Replication Topology** - Master-slave, multi-master, or quorum-based replication with RPO/RTO targets
6. **Indexing Strategy** - Primary, secondary, composite, and specialized indexes (GIN, full-text, geospatial)
7. **Connection Pooling** - Pool sizing formula based on (core_count * 2) + effective_spindle_count, with middleware recommendations
8. **Backup & Recovery** - Snapshot strategy, point-in-time recovery, cross-region replication
9. **Migration Plan** - Zero-downtime migration from current database to new architecture using dual-write or CDC
10. **Capacity Planning** - Storage growth projection for 12, 24, and 36 months with hardware/node requirements
Include concrete configuration examples (e.g., PostgreSQL postgresql.conf tuning parameters).
Ergebnisse
## Database Strategy — high-write analytics + transactional commerce
The workload is split: OLTP orders (strong consistency) and a firehose of clickstream events (write-heavy, eventual consistency is fine). One database can't be great at both — recommend polyglot persistence.
**Selection matrix:**
| Store | Use | Consistency | Why |
|-------|-----|-------------|-----|
| PostgreSQL 16 | orders, users, inventory | strong (ACID) | transactions + relational integrity for money |
| ClickHouse | clickstream, analytics | eventual | columnar, ingests millions of rows/s, fast aggregates |
| Redis | sessions, hot cache | n/a | sub-ms reads, offloads Postgres |
**Sharding (clickstream in ClickHouse):** shard by `hash(user_id)` so a single user's events colocate, and partition by `toYYYYMM(event_time)` so old partitions drop cheaply for retention.
```sql
-- ClickHouse events table: partitioned by month, ordered for time-range scans
CREATE TABLE events (
user_id UInt64, event_type LowCardinality(String),
event_time DateTime, properties String
) ENGINE = MergeTree
PARTITION BY toYYYYMM(event_time)
ORDER BY (user_id, event_time);
```
**Postgres replication:** one primary + two read replicas; route analytics-ish reads to replicas. RPO target ~5s via streaming replication, RTO ~30s via automated failover.
**Connection pool sizing:** with 8 cores, start at `(8 * 2) + effective_spindles ≈ 18–20` per app instance via PgBouncer in transaction mode — far below a naive "hundreds of connections" that would thrash Postgres.
**Migration (zero-downtime):** dual-write orders to old and new schema, backfill historical rows with a batched job, then flip reads via a feature flag. **Capacity (12/24/36 mo):** at the stated growth, ClickHouse stays single-cluster well past 36 months; Postgres needs a read replica around month 12 and table partitioning on `orders` by `created_at` around month 24.
Modell: Claude Opus 4
36 Likes9 SavesScore: 28
1 Kommentar
Priya Nair·
Okay this system design output just saved me an afternoon.