A/B Tests: Why That “Sure Thing” Didn’t Work

“But it was a sure thing! Neil Patel told me so!” So your A/B tests don’t work. Some insight on why, and how to do better next time.

Tale as old as time: you think up a site improvement that you’re 100% certain will lift conversion rates. You test it and…no lift. 

There are a few reasons why this happens. I’m going to outline them here and explain how to solve some of these issues and start running killer tests.

Why Your A/B Tests Don’t Work

Channel Tests vs Audience Tests

Think about a physical retail store for a moment. There are certain improvements you could make to the shopping experience that would resonate with almost anyone who walked through the doors.

Think: improving lighting quality, keeping the store clean, clearly organizing the merchandise, hanging a large sign above the front door so shoppers know where you’re located.

(Caveat: some retailers strategically ignore this advice and do very well. #NoBestPractices)

There are other improvements that would only resonate with certain segments of your foot traffic. Think: offering personal shopping services, increasing the square footage of a certain product category, increasing the size and clarity of loyalty program signage.

The same distinction applies to your on-site testing efforts. Some tests are designed to optimize the channel (mobile site, Facebook ads performance, etc), and some tests are designed to increase incremental sales from a specific audience.

If you’re running an A/B test on all website traffic, you need to ask yourself if it will truly improve the site experience for most visitors, regardless of the strength of their purchase intent. Otherwise, the noise from irrelevant traffic segments will drown out any results.

A statistically significant result for all web traffic is often a test that resonated so well with high-intent shoppers that it overcame the noise of everyone else who happened to be browsing the site at the time.

Lifecycle Stages In The Audience

Some customers naturally have a higher probability of converting, no matter what kind of experience you offer up. I cover this in more detail in my loyalty program framework post. But essentially, past performance is predictive of future behavior—the more a customer has purchased in the past, the more likely they are to purchase in the future.

This is where broad, channel-wide testing often hits a wall. You have a core audience of loyal, engaged customers that is dwarfed by a huge audience of prospects and casual buyers. It’s hard to design a test variant that drives a meaningful lift across the board. 

If you’ve struggled with subject line and copy testing in email, you’ve seen this—tests lift open rates but not last click revenue. And that may be fine—increased engagement over time could lead to more sales downstream—but it’s not what most testers are looking for.

Again—tests that work here are often so powerful within the engaged segment that they overcome the noise of everyone else. Know that this is what you’re optimizing toward when you proceed.

Test Design & Resource Investment

Any real scientist will tell you that test design is more important than test results, especially when millions of dollars and human lives are potentially at stake.

Marketing guru hucksters will tell you to Get Off Your Ass And TEST EVERYTHING Using These 10 No-Fail UX Hacks. What if Moderna ran their vaccine trials the same way you run your site optimization program? Really makes you think. 🤔

All too often, test ideas come from a list of things that worked at someone’s last job, things someone saw a competitor do, or things that someone read on a blog. If you’re lucky, they’re prioritized based on potential revenue impact and then the tests are run.

And then the fun part starts—checking that Optimizely dashboard like a pot of water you really need to boil, waiting for that sweet, sweet lift.

Getting tests launched feels good, and productive. Waiting for results, and sometimes getting them, provides a fun lil’ hit of dopamine.

On the other hand, taking a more considered approach usually requires cross-functional collaboration and, potentially, new technology. Less fun. Less dopamine. But putting in the work upfront will yield more meaningful, impactful results.

How To Get More From Your A/B Tests

Set Yourself Up For Success

You’re probably prioritizing test ideas by their upside potential (…I hope). When you have your top 3-5 priorities, think through the following:

  • Is this a channel test or an audience test? Does this change truly have the potential to improve the buying journey and site experience of a majority of visitors, or will it only “work” for those with the highest purchase intent?
  • Based on the answer to that question, how do I ensure that we’re measuring the right thing? Is it appropriate to measure the lift on all traffic, or should I zoom in on a specific customer segment or metric?
  • Am I giving this test enough time or audience volume to reach statistical significance? On the flip side—am I setting a cutoff for traffic/time in which statistical significance must be met?

Minimize Noise

It’s hard to find wins running tests on your entire email file or the totality of web traffic because these audiences contain a broad mix of purchase intent. The same variant that tips a warm prospect over the edge will not move a cold prospect from zero to one.

The email file is a great example, because it’s relatively easy to suss out levels of purchase intent here. Email file size is a popular vanity metric; many email marketers would rather brag about a big file filled with dead wood than set strict criteria for cleaning the list of non-engagers. 

The result is that around 30-50% of the audience in many email files has close to zero chance of even opening an email, much less purchasing anything. Hint—if they haven’t opened an email in 12 months, putting an emoji in the subject line probably won’t do it.

For everyone else, it’s fairly simple to layer in some purchase intent data because you can match back email addresses to eCom orders. Time since last purchase and number of products/categories purchased are good signals of likelihood to convert.

Web traffic is more complicated because it’s harder to tie any given user back to their transaction data. But, to a certain extent, you can use site behavior or traffic campaign source as signals of purchase intent.

Give Lifecycle Objectives Some Love

Like I said, it’s easy to score wins when the test subject already has a high intent to purchase. You’re addressing objections or resolving ambiguity, not winning hearts and minds.

Winning feels good, and big, easy wins feel great, especially in an environment of low psychological safety! But your high intent, loyal buyer audience is always going to be small relative to casual customers and prospects. You need to spread the love around if you want sales to go up, and that means doing some hard things.

Convincing new customers to return for a second purchase is going to take more than a single A/B test in a single channel. You’re going to have to get in front of the audience across multiple channels over longer periods of time to make an impact. To get a new customer back into the high intent state, you need to do more than change a button color.

There are…reasons…why this type of testing doesn’t happen more often. I’ll get into those in a future post.

Consider Some “Just Do’s”

Sometimes you should make like Nike and Just Do It, especially smaller brands who need a long time to reach statistical significance. 

A few rich sources of actionable “just do it” feedback: 

  • Phone conversations with both new and loyal customers
  • Feedback from your customer care team. Especially valuable if you’re able to tag support tickets into categories and analyze trends over time.
  • User interviews and testing with unlikely subjects, like grandparents who are less tech savvy or loyal customers of a competitor.

If you turn up something that’s confusing people again and again, just change it. If “how to change it” is ambiguous, it may make sense to hire a contractor who is an expert in the field.

That’s it. Go forth and crush it, as your favorite guru would say.

Subscribe for 2x/monthly dispatches from the front lines of digital transformation:

Leave a Reply