Blog post

From multivariate testing to AI testing

From multivariate testing to AI testing
Written byVictor Kostyuk
Published9 Sep 2024

A/B testing is too slow, so you’ve upgraded to multivariate testing –  instead of testing two variants at the same time, you’re testing many variants at once. Multivariate testing is indeed faster than A/B testing, but otherwise, it has all the disadvantages of A/B testing: it’s faster, but still too slow; it doesn’t adapt to changing customer behavior; and it’s not personalized to each customer. There’s a radically better way to optimize your marketing campaigns, AI testing using contextual bandits. In this post, we explain the evolution of marketing experimentation methods, from A/B testing to multivariate testing to multi-armed bandits to contextual bandits. We will understand the advantages of each method over its predecessor, as well as the drawbacks of each approach. Finally, we present the way OfferFit uses and improves on contextual bandits in order to achieve true 1:1 personalization.

Multivariate testing

When A/B testing, you compare two variants by randomly assigning half of your customer audience to each variant, sending the corresponding marketing message, and comparing the performance of the variants on your outcome metric, e.g., conversion rate. For example, say you want to know which shoe to include in an email offer.

A/B testing

You have more than 2 shoes you can offer, so with an A/B testing approach, you need to continue testing more shoes against each other – a very slow and laborious process.

With multivariate testing, you test all your variants at once:

Multivariate testing

Multivariate testing is essentially A/B testing done in parallel rather than sequentially. This parallel approach offers significant time savings over traditional A/B testing, as you can test multiple variables simultaneously instead of running separate tests one after another.

However, this efficiency comes with a trade-off. Because the full audience is divided into equal parts among all the combinations of variations (2 genders x 3 styles = 6 combinations in the example above), it may be hard to achieve statistical significance if there are a lot of combinations.

For marketers, this presents a dilemma:

  1. Wait longer to collect enough data to ensure the results are reliable (signal) and not just random fluctuations (noise).

  2. Move quickly to implement what seems to be the best option based on early, potentially unreliable data.

The more combinations you test, the smaller each group becomes, exacerbating this challenge. For instance, if you have 60,000 customers and test 6 combinations, each combination is only tested on 10,000 customers. If you increase to 12 combinations, you're down to 5,000 customers per combination, making it even harder to achieve statistical significance in a reasonable amount of time.

Due to these constraints, multivariate testing is often used for a small subset of variations the marketer wants to test, rather than all the possible combinations of marketing copy or products. This helps maintain larger group sizes and achieve statistical significance more quickly, but reduces the amount of tested variants and hence the usefulness of the technique.

Another crucial limitation of multivariate testing is its static nature. Once you've run your test and determined a "winner," that result is fixed. However, customer preferences and behaviors change over time. A variant that performs well today might not be the best choice in a month or two. Traditional multivariate testing doesn't account for these shifts unless you continually run new tests, which can be time-consuming and resource-intensive.

This situation is where more advanced methods like multi-armed bandits and contextual bandits shine.

Multi-armed bandits

Because multivariate testing splits the audience equally among different variants, the experiment can be quite inefficient in terms of maximizing conversions. Each variant is sent to the same number of customers (10,000 in the example above), even if it’s clear from the first 1,000 sends that one variant is much worse than another. A multi-armed bandit (MAB for short) is an algorithm that efficiently allocates sends to each variant based on how likely that variant is to be the best. Thus, a MAB is significantly more efficient at zeroing in on the best combination than a multivariate test. This doesn’t mean that the MAB will only send the variant it currently thinks is best – MABs balance exploitation, sending the variant it currently estimates is the best, with exploration, sending other variants to improve its estimate of their goodness.

This points to another advantage of a MAB over multivariate testing: a MAB continuously experiments. A MAB will notice if, over time, a variant that was initially underperforming becomes more successful, and it will start sending that variant more frequently, proportionally to that variant's increased performance. Thus, the variant a MAB thinks is best and the distribution of variants it sends can change over time.

MVT image

MABs are great at finding the global winner – the best performing variant for a whole customer population or a segment –  and adjusting as the “winner” changes over time. But MABs have a fundamental limitation: they are unable to personalize. A MAB treats each variant it experiments with as a black box (e.g., it doesn’t know if one shoe is similar to another, so they are more likely to perform similarly) and treats all customers the same. However, neither customers nor variants are all the same!

Contextual bandits

Unlike MABs, contextual bandits are algorithms that use context about customers, variants, and the environment (e.g., is today a holiday or a weekend?) to make decisions. For example, the contextual bandit would know what the style of a shoe is, and whether it’s a men’s or women’s shoe. The bandit would also know the customer’s purchase history (what styles of shoes have they purchased in the past). This will allow the contextual bandit to quickly learn what offers are likely to work with which customers.

CB

The contextual bandit doesn’t simply select a variant based on how likely that variant is to convert on average, but how likely it is to convert for a particular customer in the given environment (e.g., on a Saturday morning).

Moreover, the contextual bandit is able to generalize across variants. E.g., if a new running shoe is released and added as an option, the algorithm will use the fact that its style is “running shoe” and will leverage learnings about running shoes to recommend the new shoe. Thus, it’s much more applicable to marketing use cases, where new variants appear all the time.

Contextual bandits have disadvantages – they are more complex to implement and maintain than MABs, they require up-to-date data on customers, and while they are able to handle a much larger set of variants than the previous methods, they are still slowed down by large collections of variants.

AI testing: how OfferFit uses and improves on contextual bandits 

To increase sample efficiency, i.e., how quickly the model learns from limited data, we at OfferFit use a ”community of bandits” to break down the recommendation into separate dimensions (e.g., day of week, time of day, channel, creative, offer), with a separate contextual bandit making decisions in each of these dimensions

To learn more about AI testing, check out our deep dive on MABs and contextual bandits or our white paper on our community of bandits.

Before founding OfferFit, Victor Kostyuk was a Lead Data Scientist at the Boston Consulting Group. He is an expert in AI testing, with extensive experience serving Fortune 500 companies on offer personalization. He holds a PhD in Mathematics from Cornell University.

Logo

Ready to make the leap from A/B to AI?