Introduction Set Your Growth Program for Success Experiments About

The right mindset before you run a single experiment

1
Most experiments will fail, and that's the point. The goal is not to find experiments that work. The goal is to run enough quality experiments to find the few that have outsized impact. A 20–30% success rate is healthy. What matters is that you're learning fast, documenting everything, and iterating relentlessly. The wins compound; the losses teach you where not to spend your next dollar.
2
Ship quality experiments quickly, but don't conflate speed with sloppiness. A fast, well-formed experiment with a clear hypothesis and proper tracking beats a slow, perfectly-designed one every time. But a fast experiment with no baseline and broken attribution is just noise. The discipline is getting to "good enough to learn from" as fast as possible, not "perfect before we launch."
Before you run any experiment

Anyone can pick an experiment from this guide and start running it today. But the teams that get the most out of this playbook are the ones who do the pre-work first. These five steps don't take long, but skipping them is the fastest way to run a lot of experiments that teach you nothing.

1
Align on your north star metric, and your near-term proxy
Every company ultimately wants to influence revenue. But depending on your monetization model and sales motion, revenue may be a lagging indicator that takes 6–12 months to show up after an experiment runs. Before you test anything, align your team on the metric that will actually tell you whether an experiment worked in a timeframe you can act on.

If you're a complex enterprise SaaS with a 6-month sales cycle, your experiment metric is probably new qualified meetings or 1-year ACV pipeline, not closed revenue. If you're a PLG product where users can upgrade within hours of signing up, you might measure free-to-paid conversion or trial activation rate within 72 hours. If you're somewhere in between, you might track demo requests or sign-up starts as your primary proxy.

There's no universal right answer. The right answer is the one your whole team agrees to measure before the experiment launches, and doesn't change mid-flight.
Examples: New qualified meetings, 1-year ACV pipeline, demo requests, trial activations, free-to-paid conversion rate, sign-up starts, MQL volume.
2
Set up your tracking, and quantify its expected gaps
You cannot accurately measure what you haven't tracked. Before launching any experiment, audit your full attribution stack: UTM parameter coverage, conversion event configuration in GA4, CRM attribution logic, and ad platform pixel health.

The critical thing most teams skip is quantifying their expected data gap. No tracking setup is 100% accurate, and pretending it is leads to false conclusions. In most industries, UTM parameter coverage runs at 60–80% of actual traffic. But for some ICPs, cybersecurity professionals, developers, enterprise IT buyers, ad blocker rates can be 30–50%+, which means your measured traffic and conversions may represent significantly less than half of what's actually happening.

Spend time before your first experiment understanding what your coverage looks like. If your tracking captures 65% of conversions, a result that looks flat might actually be a 20% lift you're not seeing. Set thresholds accordingly.
High ad-blocker industries: cybersecurity, developer tools, IT/infrastructure, and privacy-conscious enterprise buyers. Expect materially lower UTM coverage and plan for it.
3
Define and document your ICP
Every experiment in this guide is more effective when it's aimed at a specific, well-defined audience. Before you start testing, make sure you have a clear, written definition of your Ideal Customer Profile, both at the individual level and the organizational level.

At the individual level: job title, seniority, department, and the specific pain they're trying to solve. At the organizational level: company size, industry verticals, tech stack signals, growth stage, and any other firmographic characteristics that predict a good fit.

The more tailored your experiment experience is to that specific person, the better it will perform. "Anyone at a 50–5,000 person company" is a universe, not an audience. "VP of Engineering at a Series B fintech company using AWS with a team of 20–100 engineers" is an audience you can write for, target precisely, and measure accurately.
Your ICP definition should be a living document, update it every quarter as you close deals and learn more about who actually converts and retains.
4
Establish your baselines
Before testing a new variant, you need something to compare against. Capture current conversion rates, traffic volumes, open rates, CPL, or whatever metric your experiment will influence, for a minimum of 2 full weeks before you launch.

Without a baseline, a result is just a number. With a baseline, it becomes a signal. If your homepage is converting at 2.3% today and your new hero variant drives 3.1%, that's a 35% lift worth understanding and building on. If you didn't know the 2.3%, you have no context for the 3.1%.

This applies even to experiments where you're not running an A/B test, if you're launching a new nurture sequence, know your current lead-to-opportunity rate before you turn it on.
5
Set minimum run time, sample size, and confidence threshold, before you launch
Ending an experiment early because it looks good (or bad) is one of the most common and costly mistakes in growth. Define your success criteria before the experiment starts, and commit to not acting on results until those criteria are met.

At minimum: set a minimum run time of 2 full business weeks to account for weekly seasonality. Calculate your required sample size upfront — if you only have 50 visitors per day, a 2-week test may not give you enough data to be conclusive, and you need to know that before you start.

On confidence thresholds: the industry standard is 95%, and that's the right bar when you have the traffic to reach it. But don't let perfect be the enemy of useful. If you're at an earlier stage with lower data volume, there's nothing wrong with acting on 80% or 90% confidence, especially when the cost of waiting for 95% is weeks of additional run time and the cost of being wrong is low. Be honest with yourself about which decisions warrant the higher bar and which don't. A homepage hero test on a high-traffic site should clear 95%. A headline test on a page with 30 visitors a day probably shouldn't wait that long.

Also define what you'll do with inconclusive results. "No significant difference" is a valid and useful result — it tells you that the variable you tested doesn't materially affect conversion, which is worth knowing.
Avoid running experiments over major holidays, product launches, or other known seasonal events — external factors will contaminate your results unless that contamination is intentional and accounted for.
The experiment tracker

Keep a central record of every experiment you run, including the ones that fail. A failed experiment documented well is worth more than a successful one nobody wrote down. Your experiment tracker is your team's institutional memory — it prevents you from re-running tests that already have answers, surfaces patterns over time, and creates accountability for closing experiments and drawing conclusions before opening new ones.

📋  Experiment tracker template  ·  9 columns, one row per experiment
Experiment Name
Experiment Category
Start Date
End Date
Hypothesis
Metric & Baseline
Owner
Estimated Impact
Estimated Cost ($ & time)
Results & Next Steps
Homepage hero A/B test
Website
Jun 2
Jun 16
Adding video above fold will increase demo requests by 15%+
Demo request CVR, baseline 2.3%
Sarah M.
+15% CVR lift (~12 additional demos/mo)
$500 design; 4h setup
3.1% CVR (+35%) — ship winner; test CTA copy next
LinkedIn Lead Gen Forms — demo requests
Digital Ads
Jun 5
Jun 26
LGF format will reduce CPL vs. landing page by 20%+
CPL, baseline $180
James T.
–$36/lead; target $120 CPL
$3,000 budget; 3h setup
$142 CPL (–21%) — scale budget; test new audience segment
Pricing page transparency test
Website
Jun 9
Jun 23
Adding 'starts at' price will improve pricing CVR
Pricing CVR, baseline 1.1%
Sarah M.
+20–30% CVR lift (2–3 extra demos/mo)
$0; 2h copy + 1h dev
Inconclusive (n too small) — extend run to Jul 7
Your experiment here
Category
YYYY-MM-DD
YYYY-MM-DD
If we do X, Y metric will change by Z
Metric, baseline value
Name
+/– expected delta
$X; Yh time
Result — next action
Download the full tracker template
9 columns · 8 sample rows across all categories · ready for Excel or Google Sheets
Please enter a valid email address to download.
✓ Downloading now — check your downloads folder.
Pacing rule: Close every experiment before opening a new one in the same surface area. When you have multiple tests running at once, you rarely know which one moved the needle. Give each experiment its minimum run time, let it close, draw a conclusion, share it with the team in writing — even a 3-sentence Slack summary — then move to the next. The cadence of learning, not the volume of tests, is what compounds over time.