5 A/B Testing Mistakes You Should Avoid Now!

14th Jun 2021 – 57 Common A/B Testing Mistakes & How to Avoid Them
It’s not great, however, when the test is over and you need to wait for that same developer to turn it off and install the winning variation. Not only is this frustrating, but it can seriously slow down the number of tests you can run and even the ROI of the page while waiting for it to go live.
An A/B test works by running a single traffic source to a control page and a variation of that page. The goal is to find if the change you implemented makes the audience convert better and take action.
At the same time, metrics that don’t connect to or drive a measurable outcome should usually be avoided. More Facebook likes don’t necessarily mean more sales. Remove those social share buttons and just watch how many more leads you get. Be wary of vanity metrics and remember that just because one leak is fixed doesn’t mean there isn’t another one elsewhere to address as well!
(This also stops you from seeing a drop in an important element but considering the test a win because it ‘got more shares’.)

  • Mistakes before you start testing,
  • Issues that can happen during the test,
  • And errors you can make once the test is finished.

So the test is finished. You ran for long enough, saw results, and got stat sig but can you trust the accuracy of the data?
Now it would be tempting to turn off the ‘losing’ variation and redistribute the traffic among the other variations, right? Heck… you might even want to take that extra 25% of the traffic and just send it to the top performer, but don’t do it.
If you’re testing a call to action that directly affects their time to consider and buy, that’s going to skew your results. For one thing, your control might get sales but be outside of the testing period, so you miss them.
This can actually cause your test to fail, even if you have a potential winning variant.

Common A/B Testing Mistakes That Can Be Made Before You Even Run Your Test

#1. Pushing something live before testing!

Now, you might not have a physical product. You might have a program or digital offer, but learning more about your audience’s needs and testing that, then taking it back to your sales page can be HUGE in terms of lift.
It’s easy to get excited and want to run multiple tests at once.
Losing tests can give you insight into where you need to improve further or do more research. The most annoying thing as a CRO is seeing clients who refuse to learn from what they’ve just seen. They have the data but don’t use it…
Sometimes your users are held back by trust issues and self-doubts. Other times, it’s clarity and broken forms or bad designs. The key is these are things that quantitative data can’t always tell you, so always ask your audience and use it to help you plan.

#2. Not running an actual A/B test

9 out of 10 tests are usually a failure.
We call this hitting the ‘local maximum’.
Well, good news! In this article, we’re going to walk you through 57 common (and sometimes uncommon) A/B testing mistakes that we see, so you can either sidestep them or realize when they are happening and fix them fast.
What’s the most difficult screen type to optimize? The same research reveals it’s checkout screens (with a median effect of +0.4% from 25 tests).
Almost all of us focus on the path to the sale and test for that. But the reality is the product can also be A/B tested and improved and can even offer a higher lift.

#3. Not testing to see if the tool works

Simply move onto another page in the sales process and improve on that. (Ironically this can actually prove to give a higher ROI anyway.)
Some people just test anything without really thinking it through. The thing is, it causes your data to become polluted and less accurate. Ideally, you want to use a tool that randomizes which page they see but then always shows them that same version until the test is over.
The thing is, it causes your data to become polluted and less accurate. Ideally, you want to use a tool that randomizes which page they see but then always shows them that same version until the test is over.
The thing is, it causes your data to become polluted and less accurate. Ideally, you want to use a tool that randomizes which page they see but then always shows them that same version until the test is over.
The thing is, it causes your data to become polluted and less accurate. Ideally, you want to use a tool that randomizes which page they see but then always shows them that same version until the test is over.
It keeps on running and feeding 50% of your audience to a weaker page and 50% to the winner. Oops!
Well, then your first thought should be that something is broken.
As testers, we need to be impartial. Sometimes, however, you might have a particular design or idea that you just love and are convinced that it should have won so you keep extending the test out longer and longer to see if it pulls ahead.

#33. Not stopping a test when you have accurate results

Are you running A/B tests but not sure if they’re working properly?
Some tools are just not as good as others. They do the job but struggle under traffic load or ‘blink’ and flicker.
What worked and what didn’t? Why did it happen?

#34. Being emotionally invested in losing variations

If it’s broken then fix it and restart.
So what can we do?Sometimes you can’t help it. You’ll have a test run and Google just implements a new core update and messes with your traffic sources mid-campaign *cough*.

#35. Running tests for too long and tracking drops off

(This is such an important user experience factor that Google is currently adjusting their rankings for sites that don’t have flickering or moving elements).
So be patient and test only one stage at a time or pages that are not connected in the process.

#36. Not using a tool that allows you to stop/implement the test!

Run a quick test to see how it works first. You don’t want to push a radical change live without getting some data, or you could lose sales and conversions.
This way you’ll see a much higher return for your time investment.
A failure can simply mean your hypothesis is correct but needs to be executed better.

Common A/B Testing Mistakes You Can Make After Your Test Is Finished

#37. Giving up after one test!

Ready for a secret?
Pull the bandaid off.

#38. Giving up on a good hypothesis before you test all versions of it

You can only find that out by segmenting it down into your results. Look at the devices used and the results there. You might find some valuable insights!
If you run a test for longer than 4 weeks, there is a chance that you might see users’ cookies drop off. This can cause a lack of tracking of events but they may even return and pollute the sample data again.
However… What if the test isn’t running?

#39. Expecting huge wins all the time

We call this sample pollution.
Not only can you then learn from older tests, but it can also stop you from re-running a test by accident.
Sometimes you just forgot to stop a test!
So let’s dive in…
You might have a fantastic new page or website design and you’re really eager to push it live without testing it.

#40. Not checking validity after the test

Unless you’re testing for a seasonal event, you never want to run a test campaign during the holidays or any other major event, such as a special sale or world event happening.
A lot of the time your competitors are just winging it. Unless they have someone who has run long-term lift campaigns, they might be just trying things to see what works, sometimes without using any data for their ideas.

#41. Not reading the results correctly

There’s nothing wrong with doing radical tests where you change multiple things at once and do an entire page redesign.

  • Dive deep into your analytics.
  • Look at any qualitative data you have.

For example, slider images usually have terrible performance but, on some sites, they can actually drive more conversions. Test everything. You have nothing to lose and everything to gain.
The more you understand your results, the better.

#42. Not looking at the results by segment

Here’s Ben Labay’s list of common guardrail metrics for experimentation programs:
Either way, let it run for the full cycle and balance out.
So there you have it. The 57 common and uncommon A/B testing mistakes that we see and how you can avoid them.

#43. Not learning from results

What are your results really telling you? Failing to read them correctly can easily take a potential winner and seem like a complete failure.

#44. Taking the losers

If the test is working, let it run and let the data decide what works.
Because of this, you will usually get an initial lift in response, but which dies back down over time.

#45. Not taking action on the results

There are 3 important factors to take into account when you want to test and get accurate results:
Sometimes people just run a campaign and see what changes, but you will definitely get more leads/conversions or sales if you have clarity on which specific element you want to see a lift on.

#46. Not iterating and improving on wins

The page you’re running tests on has plateaued and you just can’t seem to get any more lift from it.
Simply reduce the downtime between tests!
Getting a win but not implementing it! They have the data and just do nothing with it. No change, no insight, and no new tests.

#47. Not sharing winning findings in other areas or departments

Even worse again?
This is all worth checking out BEFORE you start running traffic to any campaign.

  • Find some winning sales page copy? Preframe it in your adverts that get them to the page!
  • Find a style of lead magnet that works great? Test it across the entire site.

#48. Not testing those changes in other departments

Fortunately, tools like Convert Experiences can be set up to stop a campaign and automatically show the winner once it hits certain criteria (like sample size, stat sig, conversions, and time frame).
They can be confused, bounce off, or even convert higher, simply because of those extra interactions.

#49. Too much iteration on a single page

But you’ll need a hypothesis that is testable, meaning it can be proven or disproven through testing. Testable hypotheses put innovation into motion and promote active experimentation. They could either result in success (in which case your hunch was correct) or failure – showing you were wrong all along. But they will give you insights. It may mean your test needs to be executed better, your data was incorrect/read wrong, or you found something that didn’t work which often gives insight into a new test that might work far better.
Because both sets of your audience are seeing the exact same page, the conversion results should be identical on both sides of the test, right?
The truth is, different experiments have varying effects. According to Jakub Linowski’s research from over 300 tests, layout experiments tend to lead to better results.
Just be aware of the significance of your segment size. You might not have had enough traffic to each segment to trust it fully, but you can always run a mobile-only test (or whichever channel it was) and see how it performs.
Another simple mistake. Either the page URL has been entered incorrectly, or the test is running to a ‘test site’ where you made your changes and not to the live version.
Simply re-run the test, set a high confidence level, and make sure you run them for long enough.

#50. Not testing enough!

So what can you do?
Do the math! Make sure you have enough traffic before running a test – otherwise it’s just wasted time and money. Many tests fail because of insufficient traffic or poor sensitivity (or both).
A super simple mistake, but have you checked that everything works?
Your test should always be tied to Guardrail metrics or some element that directly affects your sales. If it’s more leads then you should know down to the dollar what a lead is worth and the value of raising that conversion rate.
Again, you need to test one thing with your treatment and nothing else.

#51. Not documenting tests

Sometimes that new change can be a substantial dip in performance. So give it a quick test first.
Peeking is a term used to describe when a tester has a look at their test to see how it’s performing.

#52. Forgetting about false positives and not double-checking huge lift campaigns

It might look ok for you, but it won’t actually load for your audience.
And remember:
The best testers also listen to their audience. They find out what they need, what moves them forward, what holds them back, and then use that to formulate new ideas, tests, and written copy.

#53. Not tracking downline results

A style design that gives lift in one area might give a drop in others, so always test!
What do I mean?
Another thing to consider when testing is any variant that might change the audience’s consideration period.

#54. Fail to account for primacy and novelty effects, which may bias the treatment results

A new variant might technically get fewer clicks through but drives more sales from the people who do click.
You could quite easily run tests on every lead generation page you have, all at the same time.
Sometimes, you might even get more clicks because the layout has changed and they’re exploring the design.
Always be ready to go back into an old campaign and retest. (Another reason why having a testing repository works great.)
This can be distracting and cause trust issues, lowering your conversion rate.
Complete a test, analyze the result, and either iterate or run a different test. (Ideally, have them queued up and ready to go).
Sometimes we’re not only looking for more lift but to fix something that’s a bottleneck instead.
So make sure you’re running an actual A/B test where you’re splitting the traffic between your 2 versions and testing them at the exact same time.
If it’s much lower, then it could be a novelty effect with the old users clicking around. If it’s on a similar level, you might have a new winner on your hands.

#55. Running consideration period changes

Make sure to block you and your staff’s IP addresses from your analytics and testing tool. The last thing you want is for you or a team member to ‘check in’ on a page and be tagged in your test.
However, you wouldn’t want to be testing lead pages, sales pages, and checkout pages all at once as this can introduce so many different elements into your testing process, requiring massive volumes of traffic and conversions to get any useful insight.
A 1% lift on a sales page is great, but a 20% lift on the page that gets them there could be far more important. (Especially if that particular page is where you are losing most of your audience.)
Tests take time and there’s just only so many that we can run at once. In this instance, this page would actually be more profitable to run, assuming the traffic that clicks continues to convert as well…
Some people make the mistake of running a test in sequence. They run their current page for X amount of time, then the new version for X time after that, and then measure the difference.

#56. Not retesting after X time

Using quantitative data to get ideas is great, but it’s also slightly flawed. Especially if the only data that we use is from our analytics.
You need to take this into account when running your test. Some changes can be made globally, such as a simple layout shift or adding trust signals, etc.
Some testing programs insist on creating hard-coded tests. i.e a developer and engineer build the campaign from scratch.
The key when running your test is to segment the audience after and see if the new visitors are responding as well as the old ones.

#57. Only testing the path and not the product

So let’s break it down.
When tracking your test results, it’s also important to remember your end goal and track downline metrics before deciding on a winner. Set them up to be equal from the start. Most tools will allow you to do this.
For example, a new variant could seem to be converting poorly, but on mobile, it has a 40% increase in conversions!


Likewise, have you gone through and tested that your new variation works before you run a test?
Taking a sales page from a 10% to 11% conversion can be less important than taking the page that drives traffic to it from 2% to 5%, as you will essentially more than double the traffic on that previous page.

Start Free Trial Reliably
Start Free Trial Reliably

Posted by Contributor