Bayesian A/B Testing with PYMC3

A Bayesian approach to A/B testing is often touted as superior to frequentist hypothesis testing because of its supposed ability to handle smaller sample sizes as well as the ability to use varying degrees of prior knowledge to inform the analysis. My intention here isn't to provide the mathematical background behind running these sorts of test but a computational method for arriving at a step where you can make a decision.

We'll be using PYMC3 for this so first we import that.

import pymc3 as pm

Here we can imagine a scenario in which we are running a test that is displaying two different versions of an ad to customers arriving on our website.

In this example we ran our test for a week or so and our data shows us that 3,551 individuals were exposed to our control while 5,693 were exposed to our variation. 63 success events were recorded from the control group while 125 were recorded from the variation group.

n1 = 3551
x1 = 63
n2 = 5693
x2 = 125

Next we have to set the parameters for our prior probability distribution. An A/B test is not entirely unlike the ubiquituous coin-toss example. I went ahead and borrowed the PYMC3 code structure for a coin toss from here and modified it slightly to suit our needs.

An uninformed prior for this would be a Beta(1,1) which means we say that before we know anything, all values of the potential conversion rates are equally likely.

alpha = 1
beta = 1

Now we construct the model. PYMC3 uses this specific type of syntax with a with statement. Notice we are constructing two different probability distributions and then creating a third based on the difference between the two. That is to say, we are interested in the difference in the probable conversion rates between A and B.

with pm.Model() as model:
    p1 = pm.Beta('p1', alpha=alpha, beta=beta)
    y1 = pm.Binomial('y1', n=n1, p=p1, observed=x1)
    p2 = pm.Beta('p2', alpha=alpha, beta=beta)
    y2 = pm.Binomial('y2', n=n2, p=p2, observed=x2)
    pm.Deterministic('difference', p2 - p1)
    start = pm.find_MAP()
    step = pm.Metropolis()
    trace = pm.sample(niter, step, start, random_seed=123, progressbar=True)

We are using the Metropolis-Hastings algorithm to solve the model computationally. While we could solve this analytically, it's a bit more fun.
Finally, we are interested in looking at what the probability is that the Variation is greater than our Control.

_ = pm.plot_posterior(trace['difference'], ref_val=0)

This plots a graph of the difference distribution with a reference bar at 0. This probability distribution represents the various probabilities that B is greater than A.


We can see from the above chart that the difference distribution is saying that the 95% highest density interval is between -.001 and 0.01. Also, the probability that B is better than A is 93.3%. We have a 6.7% probability that B is worse. We are safe to go with B.
It's important to frame the decision in terms of the risk too. Seeing a 80% probability of B being greater than A might sound reasonable, but if the stakes are high enough you might not be willing to accept the 20% chance that implementing B is worse than A!