Julia Moser·
Designed a clean A/B test for extending our trial, with sample size, guardrails and decision rules upfront
Design rigorous business experiments to validate or invalidate strategic hypotheses with proper controls, sample sizing, and success criteria.
Business Hypothesis Testing & Experiment Designer
You are the head of experimentation at a leading tech company (like Booking.com or Netflix). Help me design a rigorous business experiment to test a strategic hypothesis.\n\nHYPOTHESIS TO TEST: {{hypothesis_statement — e.g., 'Increasing trial period from 7 to 14 days will increase paid conversion by 20%'}}\nBUSINESS CONTEXT: {{context — product, market, current metrics}}\nDECISION DEPENDENT ON RESULT: {{what_will_you_do_based_on_the_outcome}}\nEXPERIMENT TYPE PREFERENCE: {{type — e.g., 'A/B test', 'Multivariate', 'Cohort', 'Pilot', 'Natural experiment'}}\nRISK TOLERANCE: {{risk_level — e.g., 'Low risk only', 'Moderate', 'Aggressive'}}\n\nOUTPUT — Complete Experiment Design Document:\n\n## 1. HYPOTHESIS REFINEMENT\nConvert the business hypothesis into a rigorous, testable format:\n\n**Null Hypothesis (H₀)**: [State the 'no effect' version]\n**Alternative Hypothesis (H₁)**: [State the predicted effect version]\n**Directional vs. Non-directional**: [One-tailed or two-tailed test?]\n\n**SMART Hypothesis Check**:\n| Criterion | Assessment | Pass? |\n|-----------|-----------|-------|\n| Specific | Is the effect clearly defined? | |\n| Measurable | Can we measure the outcome precisely? | |\n| Actionable | Will the result drive a decision? | |\n| Realistic | Is the expected effect plausible? | |\n| Time-bound | Can we get results in a reasonable timeframe? | |\n\n## 2. EXPERIMENTAL DESIGN\n### Design Type: [A/B / Multivariate / Before-After / Interrupted Time Series / Synthetic Control]\n\n**Treatment Group**:\n- Definition: Who/what gets the intervention?\n- Sample size: [N] — with power calculation justification\n- Treatment details: Exactly what changes?\n- Duration: How long will treatment last?\n\n**Control Group**:\n- Definition: Who/what doesn't get the intervention?\n- Sample size: [N]\n- Control condition: What do they experience instead?\n- Why this control: [Hold-out / Placebo / Status quo / Historical]\n\n**Randomization Strategy**:\n- Unit of randomization: [User / Session / Geographic / Time-based / Cluster]\n- Method: [Simple / Stratified / Cluster / Matched pair]\n- Stratification variables (if any):\n\n## 3. SUCCESS METRICS FRAMEWORK\n\n| Metric Type | Metric Name | Formula | Current Baseline | Target | Measurement Method |\n|-------------|-------------|---------|-----------------|--------|-------------------|\n\n**Primary Metric** (1 only — the decision metric):\n- Why this metric: [Directly tied to business outcome]\n- Minimum Detectable Effect (MDE): [X%] — the smallest effect worth detecting\n\n**Guardrail Metrics** (2-4 — must not degrade significantly):\n- e.g., Revenue per user, Customer satisfaction, Churn rate, Page load time\n\n**Secondary Metrics** (2-4 — exploratory, generate follow-up hypotheses):\n\n## 4. SAMPLE SIZE & POWER CALCULATION\n- Significance level (α): 0.05 (standard) or {{custom}}\n- Statistical power (1-β): 0.80 (standard) or {{custom}}\n- Minimum Detectable Effect: {{X%}} or calculate based on baseline\n- Required sample size per variant: [N]\n- Total sample needed: [N]\n- Estimated experiment duration to reach sample: [Days]\n- If sample is insufficient: Recommendations (sequential testing? lower power? larger MDE?)\n\n## 5. STATISTICAL ANALYSIS PLAN\n- Primary test: [t-test / chi-square / Mann-Whitney U / OLS regression / Bayesian]\n- Multiple comparison correction (if testing multiple metrics): [Bonferroni / Holm-Bonferroni / False Discovery Rate]\n- Segmentation analysis: Pre-planned subgroup analyses (avoid data dredging)\n- Covariates to control for: [List]\n\n## 6. EXPERIMENT PROTOCOL\n### Timeline:\n| Phase | Activity | Duration | Owner | Deliverable |\n|-------|----------|----------|-------|-------------|\n\n### Pre-launch Checklist:\n- [ ] QA on treatment delivery\n- [ ] Data pipeline validation\n- [ ] Sample size validation\n- [ ] Stakeholder alignment on early stopping rules\n\n### During Experiment:\n- Monitoring cadence: [Daily / Weekly]\n- Interim peeking rules: [No peeking / Group sequential / Always valid p-values]\n- Early stopping criteria: [Success boundary / Harm boundary / Futility]\n\n## 7. RESULTS INTERPRETATION FRAMEWORK\nBefore seeing results, define:\n- **Win**: Primary metric improvement statistically significant AND guardrails OK → Full rollout\n- **Loss**: Statistically significant harm on primary or guardrail metrics → Stop and investigate\n- **Inconclusive**: Not statistically significant but directionally positive → Iterate and retest with changes\n- **Flat**: No meaningful direction → Consider different hypothesis\n\n## 8. POST-EXPERIMENT PLAN\n- Rollout strategy (if win): Phased vs. immediate, monitoring period\n- Rollback plan (if loss): How to reverse treatment\n- Follow-up experiments generated by results\n- Learning documentation template
Ergebnisse
# Experiment Design — Extending the Free Trial 7→14 Days
**Hypothesis:** doubling the trial from 7 to 14 days lifts paid conversion by 20%. **Decision:** roll out if it wins without hurting CAC payback. **Type:** A/B. **Risk:** moderate.
## 1. Hypothesis
- **H₀:** trial length has no effect on paid conversion.
- **H₁:** 14-day trial increases conversion (one-tailed).
## 2. Design
**Treatment:** new signups get 14 days. **Control:** 7 days (status quo). **Randomization:** by user, simple, hashed on user-id. Stratify by acquisition channel.
## 3. Metrics
| Type | Metric | Baseline | Target |
|------|--------|----------|--------|
| Primary | Trial→paid conversion | 9.0% | +20% rel (≥10.8%) |
| Guardrail | CAC payback months | 11 | no worse |
| Guardrail | Refund rate | 3% | no worse |
MDE: 1.8pp absolute. α = 0.05, power = 0.80.
## 4. Sample / Power
Required ≈ 8,400 per arm. At ~600 signups/day split 50/50 → ~28 days to power. Run a full 4 weeks to absorb weekly seasonality.
## 5. Analysis
Two-proportion z-test on the primary; check guardrails with non-inferiority margins. No peeking before day 14; group-sequential boundary if we must look early.
## 6. Decision Rules
- **Win:** primary significant + guardrails OK → full rollout.
- **Loss:** payback degrades materially → stop.
- **Inconclusive:** directionally up, not significant → extend or iterate copy.
## Pre-launch checklist
- [ ] Trial-timer logic QA'd
- [ ] Channel stratification verified
- [ ] Stop rules agreed with finance
Modell: Claude Sonnet 4
89 Likes25 SavesScore: 70
5 Kommentare
Chloe Adams·
This research, analysis prompt is going straight into my system.
Felix Bauer·
The buffer time between blocks is the detail I always skip and shouldn't.
Noah Steiner·
The frog-first ordering is doing real numbers for me.
Sofia Almeida·
Sending this research, analysis one to my team.
Ethan Reed·
Put this on a sticky note above my desk. That good.