Eagle78 (3) [Avatar] Offline
#1
Dear Nina Zumel and John Mount,

On the top of page 355 in the appendix on AB-testing I find it difficult to understand why the input parameters of shape1 and shape2 parameters of the beta distribution are defined as

shape1=commonRate+tab['B','1']
shape2=(1-commonRate)+tab['B','0']

Without understanding the equation exactly, intuitively I feel that the effect of the parameter "commonRate" (between 0 and 1) in this equation is probably to small compared to the values tab['B','0'] and tab['B','1'] (9398 and 602 respectively)

If the equation is correct, could you please refer to the underlying theory?

With kind regards,
Arend


PS: By the way, I think it's GREAT that you added this section, since I'm a webanalist with ambition to become a data scientist...
john.mount (79) [Avatar] Offline
#2
Re: Definition of shape1 & shape2 in Bayesian evaluation of AB-test - page 355
Thanks for the question. Sorry if we were too telegraphic about what is going on. It is actually a beautiful topic, and I'll try to explain it here.

You are right, with the amount of data we have the commonRate gets swamped out (so you don't really need it, you would get nearly the same answer without it). That is a good thing.

Roughly what we are doing is using a Bayesian formulation of A/B testing. The math is based on the assumption the conversion rate or intensity is an unobserved quantity that has a prior distribution of plausible values. As we observe events we get a new posterior estimate of the plausible distribution of the possible values of the conversion rate. The easiest way to do this is to additionally assume that the unknown conversion rate is distributed according to the beta distribution (mentioned in this section of the appendix and earlier in the appendix).

The beta distribution has two shape parameters here called shape1 and shape2. We are implicitly saying before we look at the B-results a plausible somewhat non-informative prior of the B-rate is distributed as Beta(shape1=commonRate,shape2=1-commonRate). That is a distribution with a mean value equal the commonRate (the rate of conversions from the A/B observations grouped together- sort of a frequentest style null hypothesis that there is no difference or a deliberate bias of assuming there is no difference prior to looking at data). The commonRate is a fraction, so this is like adding a single observation that is fractionally split between converting and not converting. The, as in the book, after we see the tab['B','1'] and tab['B','0'] we say the posterior distribution of the rate is distributed as Beta(shape1=commonRate+tab['B','1'],shape2=1-commonRate+tab['B','0') which is our actual observations plus our fractional pseudo-observation. So if we see the distribution of possible B-rates as very far from A-rates this is good evidence the B rate is in fact better.

This can seem a bit mysterious. But the ease of calculation is from what the Bayesians call "conjugate distributions." If we assuming the unknown B-rate is Beta-distributed with some parameters, then the posterior distribution estimate is Beta-distributed with new parameters (that are in fact just the original parameters with the observations added in). This is picking a prior distribution on the unknown parameter (the Beta distribution) that is conjugate to the assumed data generating process (the Binomial or Bernoulli distribution).

The theory is pretty standard (though often used with truly uninformative prior called the Jeffreys prior) and Bayesian. This is also related "Laplace smoothing" where you add one positive and one negative pseudo observation before starting (though here we are adding a total of one pseudo-observation instead of two, and we are monkeying around to get the starting mean to be something plausible and not always 1/2 which could be a huge conversion rate).

This is definitely something you will want to read more on (by us and by others). The book was limited by space. I suggest checking out http://www.win-vector.com/blog/2014/04/bandit-formulations-for-ab-tests-some-intuition/ and http://www.win-vector.com/blog/2014/05/a-clear-picture-of-power-and-significance-in-ab-tests/
nina.zumel (16) [Avatar] Offline
#3
Re: Definition of shape1 & shape2 in Bayesian evaluation of AB-test - page 355
In addition to the posts that John suggests, you might also want to look at

http://www.win-vector.com/blog/2013/05/bayesian-and-frequentist-approaches-ask-the-right-question/

which describes the relationship between the binomial and beta distributions in a little more detail.
john.mount (79) [Avatar] Offline
#4
Re: Definition of shape1 & shape2 in Bayesian evaluation of AB-test - page 355
That was the article I should have linked to, thanks Nina!
Eagle78 (3) [Avatar] Offline
#5
Re: Definition of shape1 & shape2 in Bayesian evaluation of AB-test - page 355
Thanks!

I'll read into it and will then build and build a Bayesian AB-test calculator.

I started with displaying the trade-off of alpha vs beta (confodence vs power):
http://glimmer.rstudio.com/odnl/ab-test-calculator/

Cheers,
Arend
john.mount (79) [Avatar] Offline
#6
Re: Definition of shape1 & shape2 in Bayesian evaluation of AB-test - page 355
That is neat, thank for sharing. Just tweeted it: https://twitter.com/WinVectorLLC/status/482166500220887040