Blog post

The attribution dilemma

Written byVictor Kostyuk

Published15 May 2024

Ms. Rogers just signed up for a contact lenses subscription on your website. Cha-ching! Your last email to Ms. Rogers was an hour ago – but it was for a discount on new frames, not contacts. Last week, you did email her about your contacts subscription plan, but maybe that was too long ago to have influenced her decision today. Suppose Ms. Rogers didn’t click through the emails, nor – as far as you can trust your open data – did she even read them. Still, perhaps the billboard effect reminded her she really needs new contacts. Should you now be more confident in the effectiveness of each of those emails, or less?

This dilemma is the famous attribution problem in marketing. The crux of any data-driven marketing strategy is the ability to measure what’s working. That means marketers must know – or make the best possible guesses – which of their actions influenced customer actions, and how much. Without proper attribution, personalization is impossible. Or more exactly – you can certainly personalize, but you’ll have no way of knowing if any of your personalization is working. Without cracking the attribution code, you might find yourself updating the famous quote: “Half the money I spend on personalization in marketing is wasted. The trouble is, I don’t know which half.”

More recently, marketers have turned to AI decisioning agents to experiment and learn which options work best with each individual customer. This new method of AI decisioning relies on reinforcement learning agents – AI which can experiment and learn. A reinforcement learning agent chooses from a bank of available actions, and receives some reward in response. Based on the strength of the reward it receives, the agent learns and updates its policy for choosing new actions. (OfferFit's AI decisioning uses contextual bandits, a type of reinforcement learning, for this purpose.)

Let’s return to the example of Ms. Rogers and her contact lenses. Suppose the marketer in this story hadn’t picked emails based on results of manual A/B testing, but through AI decisioning. The agent sent an email last week about contact lenses, and then an email today about frames. Ms. Rogers took the action we were hoping for – she renewed her contact lens subscription. What agent recommendation should now be rewarded, and how much?

Getting uplift and ROI from a personalization model is hard. If any one of a myriad things goes wrong – e.g., bad data, poor content, faulty activation pipeline, incorrect model tuning or architecture, etc. – you will not see results. One of the most important and difficult things to get right is attribution: how you give credit for positive and negative customer behavior to previous agent decisions. It’s a problem that has more layers than is at first apparent. My goal here is to peel back each layer of the onion and show you the fascinating complexity underneath.

All roads start from data

OfferFit, the company I co-founded, applies reinforcement learning to lifecycle marketing – i.e., marketing to existing, identified customers. Thus, the usual limitations of cookies, piecing together cross-device and cross-browser journeys, and reconciling customer identities don’t really apply in this context. You may think this means we’re playing the game in easy mode. This is far from the truth.

The options for how we attribute customer conversions to the ML model’s recommendations are determined in part by the availability of data. Here are some possibilities, in decreasing order of attribution confidence:

The activation and conversion system is able to pass through an ID generated with the recommendation through, e.g., landing page URL parameter, or transaction metadata. This allows for exact matching: we can link the conversion event directly to a recommendation via the original recommendation id that’s passed through to the conversion event.
If we can’t pass through a recommendation ID, perhaps we know the exact product that was purchased and/or the exact channel that led to the purchase. This allows us to match by product id and channel. E.g., if Sedrick purchased a toaster after clicking on a link in an email, and two days ago we sent an email showcasing exactly that toaster, it’s safe to assume that our email influenced the purchase.
If the above doesn’t hold, perhaps the conversion data contains some information about some aspect of the recommendation. For example, we may know that the conversion involved the redemption of a 15% discount offer. We can filter previous recommendations for those that included a 15% discount offer. This allows us to narrow down the recommendations which could have had an influence on the conversion.
Finally, if none of the above data is available, then we have only proximity to go by. I.e., when did the customer convert? Was it soon after a recommendation was sent to them? If so, we attribute the conversion to this recommendation.

In practice, proximity – timing of recommendations relative to the conversion – is often used in combination with 2 and 3 to select candidate recommendations which may have influenced the conversion. That’s only the first step.

False positives and false negatives

To understand the tradeoffs in attribution decisions that come next, we need to understand the concepts of true/false negatives and true/false positives.

If we attribute customer conversion to a recommendation, and that recommendation in fact influenced the conversion, it’s a true positive: we decided “yes (positive), it was due to this recommendation” and that decision was true. If the customer didn’t convert due to that recommendation, but we incorrectly attributed the conversion to it, it’s a false positive: we decided “yes” (positive) but it was incorrect (false).

If the customer converted on the recommendation, but we failed to attribute the conversion to it, it’s a false negative. If the customer didn’t convert on that recommendation, and we (correctly) didn’t attribute the conversion to it, it’s a true negative.

Attribution decisions require balancing the tradeoff between false positives and false negatives. For example, let’s say we have an email campaign, but no additional useful data comes in the conversion events. So, we’re attributing by proximity (option 4). We can choose to only attribute a conversion to the model’s recommendation when we see an email opened or email clicked event prior to the conversion within our time window for attribution. But this approach risks false negatives. The customer’s email software may block email open callbacks, so that we don’t see an email open event. Or if we only attribute if the customer clicks on a “buy” CTA, we’ll miss a customer who went to the website and purchased without clicking the CTA, but was nonetheless influenced by the email.

Requiring an open or a click makes the attribution more strict, reducing the likelihood of false positives, since it reduces positives overall. But it increases the likelihood of false negatives, as the examples above where we fail to attribute the conversion to the recommendation.

The context of the campaign can be a guide here: can customers organically convert on the same target services or products of the campaign, or can they only convert by clicking on an email in this campaign? If they can organically convert, do they? How often? If organic conversions are common, then being more strict with attribution makes sense: the decrease in false positives more than compensates for the increase in false negatives. I.e., being more strict cuts through the noise. If organic conversions are rare, then the tradeoff will be a losing proposition overall: small decrease in false positives and a large increase in false negatives.

Good things come to those who wait, but don’t wait too long

We must also decide how long to wait to attribute a conversion or non-conversion following a decision by an AI agent. We can’t wait forever: we need to provide not only positive rewards to our model, but also negative ones, telling it which recommendations were bad so that it knows what not to do in the future. Consider the following simple rule: make a recommendation, wait 3 days after it’s activated (e.g., an email is sent), and if after three days, there are no conversions it’s attributed to, then assume this recommendation did not influence a conversion.

What if we make the window 6 days instead of 3? We reduce the likelihood of false negatives – a customer is more likely to convert within 6 days than within 3 – but increase the likelihood of a false positive. Would a purchase 5 days after a push notification, for example, really not have happened without that notification 5 days ago?

A better rule for how many days to wait would be to set an appropriate value for each channel. When an ML model makes recommendations for a simple phone upsell – for example, which add-on the phone agent should ask the customer who’s renewing their plan – we usually expect the conversion to happen or not that same day; the customer converts on the call or doesn’t. Thus it’s safe to only wait a single day before deciding whether the model's recommendation did or did not influence a conversion. For email, waiting 3 days would be more reasonable. For direct mail, 15 days is more likely to achieve a good balance between false positives and negatives.

An even better approach is to learn what is the best time to wait from the actual customer conversions data. We can learn from the data what value is optimal for each channel.

Attribution remains a hard problem

There are many other aspects to attribution which we haven’t touched on. Marketers are familiar with the problem of last-touch vs multi-touch attribution – AI agents face the same challenge. We also must separate attribution of how we contact a customer – e.g. time, channel, day of week – from attribution of what we offer – e.g. product, plan, financial incentive. Suppose a customer clicked an email link and purchased right after we sent an email, but purchased something other than the product we recommended. In this case, the AI agent likely made a good recommendation about time of send – the customer engaged with the email – but perhaps not a good product recommendation. We should likely reward the model’s decisions on time of send and product decisions differently. Attribution in “high noise” environments, such as campaigns with high organic conversion, brings its own challenges. Nonetheless, I hope the concepts and examples above give you an appreciation for the complexity of selecting the right attribution strategy for each campaign, and the importance of selecting the right attribution strategy to enable an AI agent to make good decisions.

In my work prior to OfferFit as a lead data scientist for BCG Gamma (the data science arm of the Boston Consulting Group), I saw the potential for reinforcement learning to transform personalization in marketing. But I came to see that reinforcement learning was typically too nuanced and complicated for marketers to easily build and use on their own, even with the help of consultancies. The attribution problem I’ve explained in this post is not the only challenge, but it’s a key problem. I co-founded OfferFit to give marketers a SaaS solution for AI decisioning, which productizes all of the above attribution logic. OfferFit allows attribution settings to be tailored exactly to the context of each of your campaigns, provides battle-tested defaults, and learns some of the configuration automatically from the data. OfferFit AI expert services help set up attribution for complex campaigns. This post, I hope, has helped you see what’s inside the “black box” of an AI decisioning solution like OfferFit, and gives you an idea of why, for AIs and humans alike, attribution is such a hard problem

Before founding OfferFit, Victor was a Lead Data Scientist at the Boston Consulting Group. He is an expert in AI decisioning, with extensive experience serving Fortune 500 companies on offer personalization. He holds a PhD in Mathematics from Cornell University.