Wer hat diesen Beitrag geschrieben?

Dieser Beitrag wurde von Julia Moser auf prompt2love veröffentlicht.

Julia Moser·27.5.2026

Designed a clean A/B test for extending our trial, with sample size, guardrails and decision rules upfront

Design rigorous business experiments to validate or invalidate strategic hypotheses with proper controls, sample sizing, and success criteria.

Business Hypothesis Testing & Experiment Designer

You are the head of experimentation at a leading tech company (like Booking.com or Netflix). Help me design a rigorous business experiment to test a strategic hypothesis.\n\nHYPOTHESIS TO TEST: {{hypothesis_statement — e.g., 'Increasing trial period from 7 to 14 days will increase paid conversion by 20%'}}\nBUSINESS CONTEXT: {{context — product, market, current metrics}}\nDECISION DEPENDENT ON RESULT: {{what_will_you_do_based_on_the_outcome}}\nEXPERIMENT TYPE PREFERENCE: {{type — e.g., 'A/B test', 'Multivariate', 'Cohort', 'Pilot', 'Natural experiment'}}\nRISK TOLERANCE: {{risk_level — e.g., 'Low risk only', 'Moderate', 'Aggressive'}}\n\nOUTPUT — Complete Experiment Design Document:\n\n## 1. HYPOTHESIS REFINEMENT\nConvert the business hypothesis into a rigorous, testable format:\n\n**Null Hypothesis (H₀)**: [State the 'no effect' version]\n**Alternative Hypothesis (H₁)**: [State the predicted effect version]\n**Directional vs. Non-directional**: [One-tailed or two-tailed test?]\n\n**SMART Hypothesis Check**:\n| Criterion | Assessment | Pass? |\n|-----------|-----------|-------|\n| Specific | Is the effect clearly defined? | |\n| Measurable | Can we measure the outcome precisely? | |\n| Actionable | Will the result drive a decision? | |\n| Realistic | Is the expected effect plausible? | |\n| Time-bound | Can we get results in a reasonable timeframe? | |\n\n## 2. EXPERIMENTAL DESIGN\n### Design Type: [A/B / Multivariate / Before-After / Interrupted Time Series / Synthetic Control]\n\n**Treatment Group**:\n- Definition: Who/what gets the intervention?\n- Sample size: [N] — with power calculation justification\n- Treatment details: Exactly what changes?\n- Duration: How long will treatment last?\n\n**Control Group**:\n- Definition: Who/what doesn't get the intervention?\n- Sample size: [N]\n- Control condition: What do they experience instead?\n- Why this control: [Hold-out / Placebo / Status quo / Historical]\n\n**Randomization Strategy**:\n- Unit of randomization: [User / Session / Geographic / Time-based / Cluster]\n- Method: [Simple / Stratified / Cluster / Matched pair]\n- Stratification variables (if any):\n\n## 3. SUCCESS METRICS FRAMEWORK\n\n| Metric Type | Metric Name | Formula | Current Baseline | Target | Measurement Method |\n|-------------|-------------|---------|-----------------|--------|-------------------|\n\n**Primary Metric** (1 only — the decision metric):\n- Why this metric: [Directly tied to business outcome]\n- Minimum Detectable Effect (MDE): [X%] — the smallest effect worth detecting\n\n**Guardrail Metrics** (2-4 — must not degrade significantly):\n- e.g., Revenue per user, Customer satisfaction, Churn rate, Page load time\n\n**Secondary Metrics** (2-4 — exploratory, generate follow-up hypotheses):\n\n## 4. SAMPLE SIZE & POWER CALCULATION\n- Significance level (α): 0.05 (standard) or {{custom}}\n- Statistical power (1-β): 0.80 (standard) or {{custom}}\n- Minimum Detectable Effect: {{X%}} or calculate based on baseline\n- Required sample size per variant: [N]\n- Total sample needed: [N]\n- Estimated experiment duration to reach sample: [Days]\n- If sample is insufficient: Recommendations (sequential testing? lower power? larger MDE?)\n\n## 5. STATISTICAL ANALYSIS PLAN\n- Primary test: [t-test / chi-square / Mann-Whitney U / OLS regression / Bayesian]\n- Multiple comparison correction (if testing multiple metrics): [Bonferroni / Holm-Bonferroni / False Discovery Rate]\n- Segmentation analysis: Pre-planned subgroup analyses (avoid data dredging)\n- Covariates to control for: [List]\n\n## 6. EXPERIMENT PROTOCOL\n### Timeline:\n| Phase | Activity | Duration | Owner | Deliverable |\n|-------|----------|----------|-------|-------------|\n\n### Pre-launch Checklist:\n- [ ] QA on treatment delivery\n- [ ] Data pipeline validation\n- [ ] Sample size validation\n- [ ] Stakeholder alignment on early stopping rules\n\n### During Experiment:\n- Monitoring cadence: [Daily / Weekly]\n- Interim peeking rules: [No peeking / Group sequential / Always valid p-values]\n- Early stopping criteria: [Success boundary / Harm boundary / Futility]\n\n## 7. RESULTS INTERPRETATION FRAMEWORK\nBefore seeing results, define:\n- **Win**: Primary metric improvement statistically significant AND guardrails OK → Full rollout\n- **Loss**: Statistically significant harm on primary or guardrail metrics → Stop and investigate\n- **Inconclusive**: Not statistically significant but directionally positive → Iterate and retest with changes\n- **Flat**: No meaningful direction → Consider different hypothesis\n\n## 8. POST-EXPERIMENT PLAN\n- Rollout strategy (if win): Phased vs. immediate, monitoring period\n- Rollback plan (if loss): How to reverse treatment\n- Follow-up experiments generated by results\n- Learning documentation template

Ergebnisse

# Experiment Design — Extending the Free Trial 7→14 Days **Hypothesis:** doubling the trial from 7 to 14 days lifts paid conversion by 20%. **Decision:** roll out if it wins without hurting CAC payback. **Type:** A/B. **Risk:** moderate. ## 1. Hypothesis - **H₀:** trial length has no effect on paid conversion. - **H₁:** 14-day trial increases conversion (one-tailed). ## 2. Design **Treatment:** new signups get 14 days. **Control:** 7 days (status quo). **Randomization:** by user, simple, hashed on user-id. Stratify by acquisition channel. ## 3. Metrics | Type | Metric | Baseline | Target | |------|--------|----------|--------| | Primary | Trial→paid conversion | 9.0% | +20% rel (≥10.8%) | | Guardrail | CAC payback months | 11 | no worse | | Guardrail | Refund rate | 3% | no worse | MDE: 1.8pp absolute. α = 0.05, power = 0.80. ## 4. Sample / Power Required ≈ 8,400 per arm. At ~600 signups/day split 50/50 → ~28 days to power. Run a full 4 weeks to absorb weekly seasonality. ## 5. Analysis Two-proportion z-test on the primary; check guardrails with non-inferiority margins. No peeking before day 14; group-sequential boundary if we must look early. ## 6. Decision Rules - **Win:** primary significant + guardrails OK → full rollout. - **Loss:** payback degrades materially → stop. - **Inconclusive:** directionally up, not significant → extend or iterate copy. ## Pre-launch checklist - [ ] Trial-timer logic QA'd - [ ] Channel stratification verified - [ ] Stop rules agreed with finance

Modell: Claude Sonnet 4

89 Likes25 SavesScore: 70

5 Kommentare

Chloe Adams·28.5.2026

This research, analysis prompt is going straight into my system.

Felix Bauer·28.5.2026

The buffer time between blocks is the detail I always skip and shouldn't.

Noah Steiner·28.5.2026

The frog-first ordering is doing real numbers for me.

Sofia Almeida·28.5.2026

Sending this research, analysis one to my team.

Ethan Reed·29.5.2026

Put this on a sticky note above my desk. That good.