I’ve heard that Bayesian priors are dangerous in A/B testing. Do you assume that a specific variant is the winner?
No, this is a common misconception. With respect to winners and conversion rates, we do our best to use uninformative priors -- priors with as little influence on the experiment results as possible.
We do sometimes use priors that are informative, but these have more to do with how quickly a variant’s results are allowed to converge, which helps us find solid results for low-traffic tests.
What is a Bayesian prior?
Bayesian priors are modeled beliefs about how we think a variant or experiment will behave. When data comes in, the prior is blended with the data to form a posterior, which is the result. As more data comes in, the prior is said to be “overwhelmed”, and matters less and less. For Optimize, we use a variety of priors. As more and more data comes in, the prior’s influence fades away.
Despite the nomenclature, however, priors don’t necessarily come from previous data; they’re simply used as logical inputs into our modeling.
Many of the priors we use are uninformative - in other words, they don’t affect the results much. We use uninformative priors for conversion rates, for example, because we don’t assume that we know how a new variant is going to perform before we’ve seen any data for it.
We do use some priors that are more informative, such as with our hierarchical models. In those models, we use priors that help experiments with really consistent performance to find results more quickly. Of course, if an experiment’s data is not consistent, that data quickly overwhelms the prior, rendering it less and less influential as more data comes in.
It’s worth noting that “prior” might suggest that we use past data from Google Analytics. While this would be possible, we don’t use this data today.
What kinds of models do you use?
We use different models for different objectives, depending on how they behave. We’re also constantly exploring new models to help you find the most accurate results as quickly as possible.
Can you walk me through your analysis process?
The process varies slightly depending on the objective and method of measurement, but proceeds along these lines:
- Gather raw event or hit data into the Google Analytics back end.
- Aggregate experimental data, often into a modified format, depending on the objective. For example, we perform logarithmic transforms on some metrics as needed before aggregating.
- Daily aggregated data is ingested into our statistics processing system.
- Using that aggregated data, we estimate the shape of the conversion rate distribution using a Markov chain Monte Carlo (MCMC) approach, sampling those results at the same time.
- By comparing very large numbers of samples from those distributions (also called draws), we’re able to generate statistics. For example, Probability to beat original and Probability to be best are generated by looking at the number of draws where a variant beats the original, or beats all other variants, respectively.
Is your multivariate testing fractional or full factorial?
Multivariate testing is among the most efficient testing you can perform. First, a brief overview of these terms:
- A multivariate test can be thought of as a combination of two or more A/B tests, in which you vary multiple parts of a user’s experience - each part is called a factor, element, or section - to produce different combinations. For example, a multivariate test might include two factors, the headline and the hero image, each of which has multiple variants. So, you might test two headlines and three hero images to find which of the six combinations works best together, as well as whether there are any positive or negative interactions present.
- A full factorial multivariate test serves all combinations to users. These could be simply analyzed as large A/B tests, although as the number of sections increases, the number of combinations grows exponentially. It might therefore take a very long time to gather enough data for valid results. The benefit, however, is that you know how all combinations perform.
- A fractional factorial multivariate test serves and analyzes only a subset of the combinations. This makes it possible to find results more quickly. However, if the true best combination wasn’t served, you won’t know if that combination would have performed best without running a followup test.
Our models allow us to use a hybrid approach, so you don’t have to make this tradeoff. We serve all combinations of a test, so you can learn about interactions and best combinations. But, we also model the fact that some variants show up across combinations -- and so we can learn about variants within a factor, not just combinations.
This FAQ article is part of a series of FAQ articles on Optimize statistics and methodology. Here are the other FAQs: