The Complete Tutorial to A/B Testing

Unlock the secrets of effective A/B testing for your ecommerce business with this in-depth guide. Learn the fundamentals, best practices, statistical analysis, and structured methodologies from industry experts to optimize conversion rates and drive growth.

Despite the wealth of information available on A/B testing, many marketers still struggle to execute it correctly. This leads to significant business decisions being based on flawed results from poorly conducted tests. One of the main issues is the oversimplification of A/B testing, particularly in content tailored for store owners. The solution? Here's a comprehensive guide to kickstart your journey into ecommerce A/B testing, explained in the simplest terms possible.

Whether you're a seasoned entrepreneur or just starting, you've likely come across numerous articles and resources discussing A/B testing. You might even already be implementing A/B tests on elements like email subject lines or social media posts.

Despite the abundance of information on A/B testing in the marketing field, many individuals still struggle to grasp its essence. This often results in crucial business decisions being made based on inaccurate results from improperly conducted tests.

‍

What is A/B testing?

‍

A/B testing, also known as split testing, involves comparing two versions of the same web page, email, or digital asset to determine which performs better based on user behavior. It serves as a valuable tool for enhancing marketing campaign performance and gaining insights into audience conversion preferences.

This process enables you to address key business inquiries, maximize revenue from existing traffic, and lay the groundwork for a data-driven marketing strategy.

‍

How A/B testing works

‍

In marketing A/B testing, you present version A of your asset (the "control") to 50% of visitors and version B (the "variant") to the other 50%.

The version with the highest conversion rate is deemed the winner. For instance, if the variant (version B) garners the highest conversion rate, it is declared the winner, and all visitors are directed to it. Subsequently, the variant becomes the new control, and a new variant must be designed.

It's important to note that an A/B test conversion rate may not always accurately measure success. For example, if one page offers an item for $50 while the other offers it for free, this wouldn't yield valuable insights. Like any business tool or strategy, A/B testing must be implemented strategically. Therefore, it's essential to track the value of conversions all the way through to the final sale.

‍

What is A/B/n testing?

‍

A/B/n testing allows you to assess more than one variant against the control. Instead of allocating 50% of visitors to the control and 50% to the variant, you might distribute visitors as follows: 25% to the control, 25% to the first variant, 25% to the second variant, and 25% to the third variant.

Note: This differs from multivariate testing, which involves testing multiple variants as well as multiple elements, such as A/B testing UX or SEO split testing, to determine the best-performing combination.

‍

‍

How long should A/B tests run?

‍

It's recommended to run your A/B test for at least one, preferably two, complete business cycles. Avoid halting the test solely upon reaching statistical significance. Ensure you also meet the predetermined sample size requirement and conduct all tests in full-week increments.

Why two full business cycles? Here are a few reasons:

To accommodate "I need to think about it" buyers.
To consider various traffic sources (Facebook, email newsletters, organic search, etc.).
To account for anomalies, like your Friday email newsletter.

Two business cycles generally provide sufficient time to gain valuable insights into the behavior of your target audience.

If you've utilized any A/B test landing page testing tool, you're likely familiar with the small green "Statistically Significant" icon.

Unfortunately, for many, this symbol has become synonymous with "the test is complete, end it." However, reaching statistical significance does not necessarily mean you should terminate the test prematurely.

Regarding your predetermined sample size, it's less daunting than it appears. Utilize a sample size calculator, such as the one from Evan Miller, to determine the required number of visitors across your web pages to enhance your conversion rates.

‍

‍

This calculation indicates that if your current conversion rate is 5% and you aim to detect a 15% effect, you'll need a sample of 13,533 per variation. Thus, over 25,000 visitors are necessary for a standard A/B test.

Consider what occurs when detecting a smaller effect:

‍

‍

Simply adjusting the minimum detectable effect (MDE) from 15% to 8% necessitates a sample of 47,127 per variation. Consequently, nearly 100,000 visitors are required for a standard A/B test.

Whether conducting A/B testing for UX or SEO split testing, calculate your sample size beforehand. Your test must not conclude, even upon reaching statistical significance, until the predetermined sample size is attained. Failure to do so renders the test invalid.

Hence, it's crucial not to blindly adhere to best practices like "stop after 100 conversions." Additionally, run split tests in full-week increments to account for fluctuations in traffic based on the day and time. Include every day of the week to ensure comprehensive analysis.

‍

Why should you conduct A/B tests?

‍

Let's illustrate this with a scenario: You invest $100 in Facebook ads to drive 10 visitors to your site, where the average order value is $25. In one instance, eight visitors leave without making a purchase, resulting in a loss of $50. However, in another scenario, only five visitors leave empty-handed, while the remaining five each spend $25, resulting in a profit of $25.

This example demonstrates the impact of increasing your online store's conversion rate through A/B testing. By optimizing your website's elements, you can make the same amount of traffic more valuable.

A/B testing not only improves conversion rates but also provides valuable insights, regardless of the test outcome. For instance, insights gained from testing images and copy can inform various aspects of your business, such as refining your value proposition or enhancing product descriptions.

Moreover, continuously striving to enhance your online store's effectiveness is crucial. A/B testing serves as a tool for ongoing improvement, ensuring your business remains competitive and adaptive in the ever-evolving digital landscape.

‍

What should you A/B test?

‍

Determining what to A/B test requires a personalized approach based on your data. While it may seem convenient to follow pre-existing lists of test ideas, the most valuable tests stem from your unique insights and observations.

Qualitative and quantitative analysis can guide your testing strategy. Consider various methodologies, such as technical analysis, on-site surveys, customer interviews, surveys, analytics analysis, user testing, and session replays.

By selecting the most suitable A/B testing methodologies and leveraging your data, you can generate a comprehensive list of data-driven ideas for testing. This approach yields more valuable insights than generic lists, ensuring your testing efforts align closely with your business objectives and customer needs.

‍

Prioritizing A/B Test Ideas

‍

Having a lengthy list of A/B test ideas can be exhilarating, but it's not particularly helpful when it comes to deciding which tests to prioritize. So, where should you begin? This is where prioritization strategies come into play.

Several common frameworks can aid in prioritization:

ICE: Impact, Confidence, and Ease. Each factor is assigned a ranking from 1 to 10. For instance, if a test can be easily executed without assistance, it might receive a high ease ranking. However, since rankings may become subjective with multiple individuals involved, having objective guidelines can help maintain objectivity.
PIE: Potential, Importance, and Ease. Similar to ICE, each factor is rated on a scale of 1 to 10. For example, if a test has the potential to reach a large portion of your traffic, it would receive a high-importance ranking. Like ICE, guidelines are beneficial for maintaining consistency in subjective assessments.
PXL: This framework, from CXL, is more customizable and objective compared to ICE and PIE. It involves answering Yes/No questions and assessing the ease of implementation. Each question is scored accordingly, facilitating more objective decision-making.

Once you have a prioritization framework in place, it's also helpful to categorize your ideas. For instance:

Implement: These are changes that need immediate action, such as fixing broken elements or addressing obvious issues.
Investigate: These ideas require additional analysis and problem definition to identify potential solutions.
Test: Ideas that are well-informed by data and sound in theory, ready to be tested for efficacy.

By combining prioritization frameworks with thoughtful categorization, you can effectively streamline your A/B testing efforts.

‍

A Crash Course in A/B Testing Statistics

‍

Before diving into A/B testing, it's essential to understand the statistical concepts involved. While statistics might not be everyone's favorite subject, grasping these fundamentals is crucial for interpreting test results accurately.

Mean:

The mean represents the average value in a dataset. It's essential to ensure that the sample used to calculate the mean is representative of the entire population.

Sampling:

A larger sample size reduces variability and increases the accuracy of the mean.

Variance:

Variance measures the average variability within a dataset. Higher variability decreases the accuracy of the mean in predicting individual data points.

Statistical Significance:

Statistical significance determines the likelihood of observing an effect by chance alone. A low significance level increases the risk of a false positive, where the apparent "winner" is not statistically significant.

Regression to the Mean:

Extreme fluctuations observed at the beginning of an A/B test are likely to regress toward the average over time. Therefore, reaching statistical significance alone is not sufficient to declare a winner; sufficient sample size and test duration are also critical.

Understanding these statistical concepts is essential for conducting meaningful A/B tests and interpreting the results accurately.

‍

What is Statistical Power?

‍

Statistical power refers to the likelihood of detecting a difference between two groups (A and B) if a true difference exists. In simpler terms, it measures the ability of a statistical test to detect an effect when it truly exists. A higher power level indicates a lower chance of failing to recognize a significant difference (false negative), while a lower power level increases the risk of overlooking a genuine effect. Typically, A/B testing tools and split-testing services aim for a standard of 80% statistical power.

‍

What are Threats to External Validity?

‍

External validity threats are external factors that can undermine the validity of your A/B test results. These factors include events or circumstances outside of your control that may influence user behavior and skew your test outcomes. Examples of external validity threats include:

Black Friday Cyber Monday (BFCM) sales
Press mentions (positive or negative)
Launch of major paid campaigns
Day of the week
Seasonal changes

For instance, during seasonal events like BFCM, your store may experience a surge in traffic, impacting the performance of your test variations. If you base your test decision on data from such anomalies, the results may not hold once conditions return to normal. While you can't eliminate external validity threats, you can mitigate them by running tests for full weeks, including diverse traffic sources, and remaining aware of potential threats.

‍

How to Set Up an A/B Test

‍

Setting up an A/B test involves formulating a clear hypothesis before proceeding with the experiment. The hypothesis should be specific, measurable, and focused on addressing a conversion problem. It should also be based on insights derived from research rather than mere ideas. A helpful formula for crafting hypotheses is:

"Because you observe [insert data/feedback from research], you expect that [change you’re testing] will cause [impact you anticipate], and you’ll measure this using [data metric]."

By following this formula, you can transform your test idea into a well-defined hypothesis ready for experimentation.

‍

Choosing the Right A/B Testing Tool

‍

Now that you're ready to select an A/B testing tool or split testing service, several options come to mind, with Shoplift, Optimizely, and VWO being among the top contenders.

Shoplift: Designed specifically for Shopify storefronts, Shoplift seamlessly integrates with your theme, allowing you to conduct split tests without any coding knowledge. Noteworthy features include Lift Assist, which provides tailored recommendations to enhance store performance rapidly. Whether you install the app directly or schedule a conversion consultation through their website, Shoplift offers a user-friendly solution to optimize your online presence.
Optimizely: Renowned for its user-friendly interface, Optimizely simplifies the process of setting up tests, even for those lacking technical expertise. The inclusion of Stats Engine facilitates comprehensive analysis of test results. However, it's worth noting that Optimizely tends to be the priciest option among the three.
VWO (Visual Website Optimizer): With SmartStats to streamline analysis and a robust WYSIWYG editor ideal for beginners, VWO offers a versatile solution for A/B testing. Additionally, every VWO plan includes valuable features such as heatmaps, on-site surveys, and form analytics, enhancing the depth of insights obtained.

For Shopify users, there are also several A/B testing tools available in the Shopify App Store that may cater to your specific needs.

Once you've chosen your preferred A/B testing tool or split-testing software, simply fill out the sign-up form and follow the provided instructions. While the process may vary slightly depending on the chosen tool, it typically involves installing a code snippet on your site and defining your testing goals.

‍

How to Analyze A/B Test Results

‍

When evaluating A/B test results, it's crucial to focus on deriving insights rather than solely determining whether the test was successful or not. Even if a test yields a negative outcome, it can still provide valuable insights for future tests and other areas of your business. Therefore, it's essential not to disregard the insights gained from tests that didn't yield the desired outcome.

Segmentation plays a key role in analyzing test results effectively. While a test may appear unsuccessful overall, it might have performed well within specific segments of your audience. Segments could include various categories such as new visitors, returning visitors, users of different devices or browsers, traffic sources, and more. By dissecting the data into segments, you can uncover hidden insights that may not be apparent when looking at the results as a whole.

Remember, analysis goes beyond simply identifying winners and losers. It involves delving into the data to understand why certain segments responded differently and what lessons can be learned from each test variation. A/B testing tools may provide data, but it's up to you to conduct a thorough analysis to extract meaningful insights.

‍

Archiving Past A/B Tests

‍

Archiving past A/B tests is essential for preserving the insights gained and preventing redundant testing in the future. Without a well-organized archive, valuable information may be lost, and there's a risk of inadvertently repeating experiments.

While there's no one-size-fits-all approach to archiving A/B tests, it's essential to maintain a record of key details such as:

The hypothesis tested
Screenshots of the control and variation
Whether the test resulted in a win or loss
Insights gleaned from the analysis

You can choose to use dedicated tools like Effective Experiments or simply organize your data using spreadsheets like Excel, especially when starting. By maintaining a comprehensive archive, you'll not only benefit yourself but also facilitate knowledge sharing with new team members, advisors, and stakeholders as your business grows.

‍

Refining A/B Testing Processes

‍

Now that you've grasped the fundamentals of A/B testing, let's delve into the specific methodologies employed by industry experts at renowned companies such as Google and HubSpot.

Krista Seiden

Krista Seiden emphasizes a structured approach to web and app A/B testing, starting with meticulous analysis as the cornerstone of any effective testing strategy. The process unfolds as follows:

Analysis: Delve into analytics data, survey responses, and user experience insights to pinpoint optimization opportunities.
Hypothesis Formation: Based on analysis findings, formulate hypotheses to address identified areas for improvement.
Test Implementation: Develop and execute tests, ensuring they run for an adequate duration to capture variations in user behavior.
Results Analysis: Evaluate test outcomes comprehensively, examining both winning and losing variations for valuable insights.
Personalization: Once a solid testing framework is established, explore opportunities for personalization tailored to user behavior and preferences.

Alex Birkett

At Omniscient Digital, Alex Birkett adopts a systematic approach to A/B testing, characterized by the following steps:

Data Collection: Gather accurate data and conduct a thorough analysis to uncover actionable insights.
Hypothesis Development: Translate insights into testable hypotheses, prioritizing them based on potential impact and implementation ease.
Test Execution: Implement tests while adhering to statistical best practices, ensuring rigorous analysis of results.
Iterative Improvement: Iterate based on findings, continuously refining strategies to drive optimization and innovation.

Ton Wesseling

Online Dialogue's Ton Wesseling advocates for a structured approach based on the ROAR model, focusing on risk assessment, optimization, and automation:

Risk Assessment: Evaluate the stage of the customer journey to determine the level of experimentation feasible.
Optimization: Conduct large-scale experiments to refine value propositions and business models.
Automation: Utilize experimentation power to exploit opportunities for accelerated growth.
Re-think: Periodically reassess the need for extensive research and experimentation to maintain momentum.

‍

‍

So web or app A/B testing is only a big thing in the optimization phase of ROAR and beyond (until re-think).

Our approach to running experiments is the FACT & ACT model:

‍

‍

The research we do is based on our 5V Model:

‍

‍

Julia Starostenko

Pinterest's Julia Starostenko stresses the importance of rigorously assessing the necessity of experiments before execution, advocating for a thoughtful approach to experimentation:

Experiment Justification: Determine whether experimenting is essential based on the magnitude of proposed changes and existing performance metrics.
Data Collection: Establish clear business metrics and ensure proper data collection mechanisms are in place.
Audience Segmentation: Randomly split the audience into control and test groups to evaluate the impact of proposed changes.
Statistical Analysis: Monitor conversion rates and determine experiment outcomes once statistical significance is reached.

Peep Laja

Peep Laja from CXL underscores the significance of robust conversion research as the foundation for effective A/B testing, outlining a structured process:

Conversion Research: Conduct in-depth research to identify site issues and prioritize them based on severity and impact.
Ideation: Brainstorm solutions informed by research insights, selecting the most promising ideas for testing.
Treatment Development: Develop and implement treatments, ensuring compatibility across devices and browsers.
Test Execution: Launch tests and conduct post-test analysis to determine outcomes and inform subsequent actions.

‍

Optimizing A/B Testing for Success

‍

Armed with these insights, you're equipped to optimize your A/B testing strategy effectively. By implementing structured processes tailored to your business needs, you can unlock valuable insights and drive continuous improvement in conversion rates and user experience. So, harness the power of A/B testing software and embark on your journey to enhanced profitability and customer satisfaction.

And if you're looking to improve your website's effectiveness and increase user engagement, our CRO service is exactly what you need. Contact us.

‍

Tags:

eCommerce

Marketing

CRO