Test ads with experiments

Use Display & Video 360's A/B testing framework to evaluate every dimension of your campaign

Ever wonder what works and what doesn't? Now you can find out. Experiments are an integrated testing framework that helps you understand what works best for your business. By splitting your cookies into mutually exclusive groups, experiments let you test different combinations of targeting, settings, and creatives and find out what performs best. The findings can be used to improve performance of campaigns mid-flight or influence planning for future campaigns.

Using A/B experiments, you can:

  • Test every single variable dimension affecting a campaign, including targeting, settings, creative, and more. 
  • Report on key metrics, such as CPC, CTR, CPA, CVR, CPM, and so on.


  • Baseline (control): The line item or insertion order to which all others will be compared. This should be the standard from which variations will be made to create different variants to test.

  • Variant: An experimental line item or insertion order with a single variable being tested, relative to the baseline line item or insertion order.

  • Arm: The variants and the baseline are considered separate arms of the experiment. An arm may contain an individual line item or insertion order or a group of line items or insertion orders.  

  • (Actual) metrics: The raw, observed results measured in the experiment are known as the "actual" amounts. In the experiments results table, the actual amounts are displayed in parentheses beneath the normalized metrics.

    Normalized metrics, which are the top numbers in the cells for clicks, conversions, impressions, or revenue, are calculated by extrapolating the actuals into the amount that would be observed if the baseline or variant received 100% of the audience split. For example, if the actual conversions observed were 170,000 for a line item with a 34% audience split, the line item's normalized conversions would be 500,000 if the same line item received 100% of the audience split.

  • Confidence interval: The confidence interval is a way of indicating how certain the results are with regard to a range of possible values. The range of values is so defined that there is a specified probability (.90 or .95) that the true value lies within it. The boundaries (or endpoints) of the confidence interval are displayed in the square brackets within the "Difference(%)" columns.

  • p-value: The lower the p-value, the higher the chances there is a real performance difference between the baseline and the variant. This is only displayed for the goal you've selected.

Set up an experiment

  1. Start in your advertiser, then expand Resources in the left menu and click Experiments and Lift.
  2. Once you're on the Experiments and Lift page, click Create new > Experiment.
  3. On the page that opens, enter the following details for your experiment:
    1. Name of the experiment.

    2. What you're trying to measure in the experiment. This is expressed by the goal and the confidence interval you set.

      1. Experiment goals:
        • Conversions
        • Clicks
        • Completed video views
      2. Confidence interval:
        • 95% (most common)
        • 90%
    3. What you're including in your experiment.

      1. Experiment type:
        • Compare insertion orders
        • Compare line items
      2. Items to compare
        • Depending on your experiment type, pick at least two insertion orders or line items to use in the experiment. Make sure one of the things you select is your baseline (or, "control").

        • (Optional) Once you've added multiple items, you can adjust the audience split to control the distribution of all cookies in your experiment among your experiment's insertion orders or line items.

    4. When your experiment will run.

      1. Start date
      2. End date

        When possible, the start and end dates of your experiments should coincide with the start and end dates of the insertion orders or line items involved with the experiment. If an insertion order or line item is active before or after the experiment, it will serve to 100% of users (i.e., not follow the audience split of your experiment). This also means that if you stop an experiment before the insertion orders or line items have reached their end dates, they will continue to serve on 100% of the users.

        You can also run your experiment indefinitely if you don't want to set an end date.

  4. Once you've entered all the details of your experiment, click Save.

Evaluate the results of an experiment

  1. Start in your advertiser, then expand Resources in the left menu and click Experiments and Lift.
  2. Find the experiment you're interested in, then click the name of the experiment to see its results.
    The Is significant column will show if the experiment has found a statistically significant difference between the baseline and any variant for the goal of the experiment.
  3. On the page that opens, look at the Difference(%) column to understand the differences between your baseline and each variant according to various metrics (for example, conversion rate, clickthrough rate, or eCPM).

    Generally, green arrows mean the difference between the variant and baseline is good, red arrows mean the difference is bad. The values in the square brackets are the lower and upper endpoints of the confidence interval.

Best practices

Keep in mind the following when planning an experiment.

Planning and setup
  • Only test 1 variable per experiment. Keep all arms of the experiment (baseline and any variants) exactly the same, except for a single variable that you're testing.

  • Create insertion orders or line items for your experiments by duplicating them, rather than creating them from scratch. This makes it easier to ensure the items in your experiments are identical, except for the single dimension you're testing as a variable.

  • Only use new insertion orders or line items in your experiments. If an insertion order or line item has previous activity outside of the experiment, this may impact conversion counting.

  • Eliminate outside influences. Make sure your line items outside of your experiment aren't competing with the budgets of the line items in your experiment. So, if possible, use a separate insertion order for any line items that will be used in a given experiment.

    Additionally, if possible, try not to use reuse the creative you're using in an experiment anywhere outside of your experiment.

  • Set a sufficiently high frequency cap. If you're using insertion orders in your experiment, make sure your campaign’s frequency cap is at least as high as the sum of the highest frequency cap of any insertion order participating in the experiment plus all of the frequency caps of the insertion orders remaining that aren't used in the experiment.

    For example, if you have a campaign that contains 3 insertion orders, but only 2 are part of an experiment, you'd determine your campaign's minimum frequency cap by adding the highest frequency cap between the two participating insertion orders to the frequency cap of the insertion order that isn't being used in the experiment. So, if the insertion orders in the experiment have frequency caps of 10 and 8 respectively, and the third insertion order, which isn't a part of the experiment, has a frequency cap of 5, the campaign should have a frequency cap of 15. This is determined by adding 10 (which is the highest frequency cap of any insertion order associated with the experiment) to 5 (which is the sum of all of the frequency caps of the insertion orders remaining in the campaign that are outside of the experiment.

    This same best practice applies to the insertion order-level frequency cap if your experiment is comparing line items.

  • Plan your budget and pacing deliberately. The budget set for each arm of your experiment should be proportional to your experiment's audience split. If you allocate budget differently and not in proportion you are making budget part of the experiment variables. Similarly pacing should be the same or it too would be another variable in the experiment. This best practice should be extended beyond line items within an experiment, and also to other line items not in the experiment but within the same insertion order . Their ability to spend budget and pace will affect how the experiment line items buy inventory and thus will influence the results.

  • Be careful when you have limited reach. If you anticipate having a relatively limited reach (for example, you're buying deal inventory or audience inventory with a limited reach), experiments may produce wide confidence intervals, which may make it difficult to evaluate the efficacy of your variants.

  • Finalize things ahead of time. Experiments should be completely set up at least 24 hours before the experiment goes live in order to have sufficient time for all creatives to be approved.

    Additionally, all budgets for insertion orders and line items in experiments should be finalized at least 24 hours before the experiment starts.

  • Confirm things are active. Make sure all insertion orders and line items in experiments are "active" (i.e., not paused) at least 24 hours before the start of the experiment to avoid bias by status change. Prevent insertion orders or line items from serving before the experiment starts by using budget segments.

While an experiment is running
  • Don't pause the experiment. If you need to temporarily pause a campaign, but you plan on continuing your experiment, pause the arms of the experiment (making sure to pause all of them), but not the experiment itself. Then, when you resume the campaign, make sure to activate all of the branches at the same time.

    If you end an experiment, it can’t be restarted. Additionally, all entities assigned to the experiment will go back to serving across 100% of your users.
  • Make uniform edits. If you need to make a change to your insertion orders or line items while an experiment is running (for example, if you need to remove at site for brand suitability reasons), make sure you apply the same change to all arms of the experiment.


  • Experiments can't be run on the following types of inventory:
    • Programmatic guaranteed default line items or insertion orders with default line items.
    • TrueView line items.
  • A line item or insertion order can only be used in a single active experiment at a given time.
  • You can't adjust the audience split percentages after an experiment starts.
  • Currently, the experiments framework isn't cross-device aware, so a user could see one variant of the experiment on their mobile device and the baseline on their desktop.

Frequently asked questions

What's the difference between Campaign Manager's audience segmentation targeting and experiments in Display & Video 360?

Audience segmentation targeting in Campaign Manager is focussed on splitting traffic between different creatives, while experiments in Display & Video 360 give you the ability to split traffic among different insertion orders and line items, which makes it possible to test more things than just creatives. Campaign Manager's audience segmentation targeting divides the traffic of a Campaign Manager campaign into different groups of users (i.e., "segments") and allows you to traffic a different creative for each segment. On the other hand, Display & Video 360's experiments split audiences at the insertion order or line item level, which lets you test any setting or targetable dimension.

Why can't I add a particular insertion order or line item to my experiment?

Any insertion orders or line items that can't be added to your experiment will be hidden from view or displayed as unselectable while you're setting up your experiment. If you see an insertion order or line item that can't be selected, hover over the question mark next to the name of the item to see the reason it can't be added to your experiment.

What's the difference between Google Optimize and experiments in Display & Video 360?

Experiments in Display & Video 360 are designed to compare advertising campaign tactics, such as targeting, settings, and so on), while Google Optimize is designed to compare different sites or landing pages. Additionally, Display & Video 360's experiments use a frequentist model (which is similar to most other ads effectiveness measurement solutions), while Google Optimize use a Bayesian model (which is better suited to manage low sample size scenarios).

Was this article helpful?
How can we improve it?