Aren't frequentist methods good enough?
Many times, yes, and if you’re doing clinical trials and don’t do any p-value hacking, sure. But most frequentist methods provide p-values and confidence intervals that ignore any other effects, and don’t directly answer questions that testers are really asking. If you’re doing anything more complex, such as trying to find the probability of any one variant being best, or wanting to take into account real fluctuations in your users’ behavior, there are many benefits to the approach Optimize uses.
Wait...you don’t have a sample size requirement?
No! Unlike frequentist approaches, Bayesian inference doesn’t need a minimum sample. If your conversion rates are really consistent (and consistently different) with low traffic, you can still find actionable results. You may find a great experiment opportunity on a part of your site that doesn’t get much traffic, and our approach allows us to deal with those cases well. This might seem odd if you’re used to much larger sample size requirements, but it is one of the benefits to our approach.
What confidence level do you use?
Our intervals represent the 95% range of where your true conversion rates likely fall, though you can hover over the improvement ranges to see median and 50% range values.
Do you use a one- or two-tailed t-test?
Neither! Bayesian inference -- the approach that we use -- doesn’t utilize this concept.
But, don’t significance and p-values tell me the probability that my variant would beat my control?
No. The definition of p-value is: the probability of getting results at least as extreme as the ones you observed, given that the null hypothesis is correct, where the null hypothesis in A/B testing is that the variant and the control are the same.
P-values are difficult to explain intuitively. The popular United States political blog FiveThirtyEight explored this question and concluded: “You can get it right, or you can make it intuitive, but it’s all but impossible to do both.” There are numerous misunderstandings about p-values, but in summary, p-values alone do not give the information that many A/B testers are looking for. Even in combination with additional data, p-values can easily be misinterpreted.
This FAQ article is part of a series of FAQ articles on Optimize statistics and methodology. Here are the other FAQs: