Test ads with experiments

Use Display & Video 360's A/B testing framework to evaluate every dimension of your campaign

Ever wonder what works and what doesn't? Now you can find out. Experiments are an integrated testing framework that helps you understand what works best for your business. By splitting your cookies into mutually exclusive groups, experiments let you test different combinations of targeting, settings, and creatives and find out what performs best. The findings can be used to improve performance of campaigns mid-flight or influence planning for future campaigns.

Using experiments, you can:

  • Test every single variable dimension affecting a campaign, including targeting, settings, creative, and more. 
  • Report on key metrics, such as CPC, CTR, CPA, CVR, CPM, and so on.


  • Baseline (control): The line item or insertion order to which all others will be compared. This should be the standard from which variations will be made to create different variants to test.

  • Variant: An experimental line item or insertion order with a single variable being tested, relative to the baseline line item or insertion order.

  • Arm: The variants and the baseline are considered separate arms of the experiment. An arm may contain an individual line item or insertion order or a group of line items or insertion orders.  

  • (Actual) metrics: The raw, observed results measured in the experiment are known as the "actual" amounts. In the experiments results table, the actual amounts are displayed in parentheses beneath the normalized metrics.

    Normalized metrics, which are the top numbers in the cells for clicks, conversions, impressions, or revenue, are calculated by extrapolating the actuals into the amount that would be observed if the baseline or variant received 100% of the audience split. For example, if the actual conversions observed were 170,000 for a line item with a 34% audience split, the line item's normalized conversions would be 500,000 if the same line item received 100% of the audience split.

  • Confidence interval: The confidence interval is a way of indicating how certain the results are with regard to a range of possible values. The range of values is so defined that there is a specified probability (.90 or .95) that the true value lies within it. The boundaries (or endpoints) of the confidence interval are displayed in the square brackets within the "Difference(%)" columns.

  • p-value: The lower the p-value, the higher the chances there is a real performance difference between the baseline and the variant. This is only displayed for the goal you've selected.

Set up an experiment

  1. Start in your advertiser, then expand Resources in the left menu and click Experiments and Lift.
  2. Once you're on the Experiments and Lift page, click Create new > Experiment.
    1. Choose Cross Exchange or YouTube & Partners
  3. On the page that opens, enter the following details for your experiment:
    1. Name of the experiment.

    2. What you're trying to measure in the experiment. This is expressed by the goal and the confidence interval you set.

      1. Experiment goals: (choose one)
        • Conversions
        • Clicks
        • Completed video views
        • Total custom impression value (when using custom bidding)
      2. Confidence interval:
        • 95% (most common)
        • 90%
    3. What you're including in your experiment.

      1. Experiment type:
        • Compare insertion orders
        • Compare line items
      2. Items to compare
        • Depending on your experiment type, pick at least two insertion orders or line items to use in the experiment. Make sure one of the things you select is your baseline (or, "control").

        • (Optional) Once you've added multiple items, you can adjust the audience split to control the distribution of all cookies in your experiment among your experiment's insertion orders or line items.

      3. Unidentified users (cross exchange only)

        • Turn on to Include users that we don't have cookies or other ID information for. Unless specifically looking to capture unidentified traffic, this option is best left off by default. Your randomized group samples can be representative of the population being studied without this traffic. Also, this traffic can introduce additional noise or contamination into your A/B groups.

    4. When your experiment will run.

      1. Start date
      2. End date

        When possible, the start and end dates of your experiments should coincide with the start and end dates of the insertion orders or line items involved with the experiment. If an insertion order or line item is active before or after the experiment, it will serve to 100% of users (i.e., not follow the audience split of your experiment). This also means that if you stop an experiment before the insertion orders or line items have reached their end dates, they will continue to serve on 100% of the users.

        You can also run your experiment indefinitely if you don't want to set an end date.

        Metrics in an experiment will be based solely on impressions that were served after the experiment's start date. This means reports in Display & Video 360 may count a different number of conversions for a given line item that's used in an experiment if the line item was active before (or after) the dates of the experiment.
  4. Once you've entered all the details of your experiment, click Save.

Evaluate the results of an experiment

  1. Start in your advertiser, then expand Resources in the left menu and click Experiments and Lift.
  2. Find the experiment you're interested in, then click the name of the experiment to see its results.
    The Is significant column will show if the experiment has found a statistically significant difference between the baseline and any variant for the goal of the experiment.
  3. On the page that opens, look at the Differences (%) column to understand the differences between your baseline and each variant according to various metrics (for example, conversion rate, clickthrough rate, or eCPM).

    Generally, green arrows mean the difference between the variant and baseline is good, red arrows mean the difference is bad. The values in the square brackets are the lower and upper endpoints of the confidence interval.

  4. (Optional) You can update your results in the following ways:

    • Select a metric: View results for a specific metric by selecting one of the tabs at the top of the results.
    • Select a baseline: By default, the chart compares the baseline to multiple variants. You can select a variant to use as the baseline from the Update Baseline dropdown.
    • Select an attribution model: When viewing results for conversion experiments, select an attribution model from the Attribution Models dropdown.
    • Select an algorithm: When viewing results for custom impression value experiments, select the custom bidding algorithm from the Custom Algorithm dropdown.
  5. (Optional) If you configured two independent brand lift studies as your experiment arms you can see brand lift results. To do this the dates of the brand lift studies and experiments must overlap. Metric selection and survey questions must also overlap (results will only show for those that are the same).
    These brand lift studies will automatically be set to ‘accelerated measurement’ if there are overlapping brand lift experiments for insertion orders within the experiment. The brand lift study will aim to collect survey responses as soon as possible and once it reaches the targeted number of responses it will automatically stop the brand lift study.
    1. For example, create two campaigns and then run a brand lift study for each. Then create an experiment with one arm for each campaign. If brand lift study dates, and experiment dates align, and the two studies use the same metrics and questions, brand lift results will be available in experiments.

Review the differences of an experiment

Navigate to the Diff tab to review the differences between the branches of an experiment. This allows you to see if the only difference is the variable you are testing with. You can correct any differences before the experiment goes live, removing potential bias or the risk of experiment providing irrelevant results. When variant arms have more than one line item, Display & Video 360 will auto-match comparisons based on the minimum number of differences observed.

Best practices

Keep in mind the following when planning an experiment.

Planning and setup
  • Only test 1 variable per experiment. Keep all arms of the experiment (baseline and any variants) exactly the same, except for a single variable that you're testing.

  • Create insertion orders or line items for your experiments by duplicating them, rather than creating them from scratch. This makes it easier to ensure the items in your experiments are identical, except for the single dimension you're testing as a variable.

  • Only use new insertion orders or line items in your experiments. If an insertion order or line item has previous activity outside of the experiment, this may impact conversion counting.

  • Eliminate outside influences. Make sure your line items outside of your experiment aren't competing with the budgets of the line items in your experiment. So, if possible, use a separate insertion order for any line items that will be used in a given experiment.

    Additionally, if possible, try not to use reuse the creative you're using in an experiment anywhere outside of your experiment.

  • Set a sufficiently high frequency cap. If you're using insertion orders in your experiment, make sure your campaign’s frequency cap is at least as high as the sum of the highest frequency cap of any insertion order participating in the experiment plus all of the frequency caps of the insertion orders remaining that aren't used in the experiment.

    For example, if you have a campaign that contains 3 insertion orders, but only 2 are part of an experiment, you'd determine your campaign's minimum frequency cap by adding the highest frequency cap between the two participating insertion orders to the frequency cap of the insertion order that isn't being used in the experiment. So, if the insertion orders in the experiment have frequency caps of 10 and 8 respectively, and the third insertion order, which isn't a part of the experiment, has a frequency cap of 5, the campaign should have a frequency cap of 15. This is determined by adding 10 (which is the highest frequency cap of any insertion order associated with the experiment) to 5 (which is the sum of all of the frequency caps of the insertion orders remaining in the campaign that are outside of the experiment.

    This same best practice applies to the insertion order-level frequency cap if your experiment is comparing line items.

  • Plan your budget and pacing deliberately. The budget set for each arm of your experiment should be proportional to your experiment's audience split. If you allocate budget differently and not in proportion you are making budget part of the experiment variables. Similarly pacing should be the same or it too would be another variable in the experiment. This best practice should be extended beyond line items within an experiment, and also to other line items not in the experiment but within the same insertion order . Their ability to spend budget and pace will affect how the experiment line items buy inventory and thus will influence the results.

  • Be careful when you have limited reach. If you anticipate having a relatively limited reach (for example, you're buying deal inventory or audience inventory with a limited reach), experiments may produce wide confidence intervals, which may make it difficult to evaluate the efficacy of your variants.

  • Finalize things ahead of time. Experiments should have sufficient time for all creatives to be approved before they start. 

While an experiment is running
  • Don't pause the experiment. If you need to temporarily pause a campaign, but you plan on continuing your experiment, pause the arms of the experiment (making sure to pause all of them), but not the experiment itself. Then, when you resume the campaign, make sure to activate all of the branches at the same time.

    If you end an experiment, it can’t be restarted. Additionally, all entities assigned to the experiment will go back to serving across 100% of your users.
  • Make uniform edits. If you need to make a change to your insertion orders or line items while an experiment is running (for example, if you need to remove at site for brand suitability reasons), make sure you apply the same change to all arms of the experiment.


  • Experiments can't be run on the following types of inventory:
    • Programmatic guaranteed default line items or insertion orders with default line items 
  • The earliest the experiment start date can be set for is 24 hours after the initial setup.
  • A line item or insertion order can only be used in a single active experiment at a given time.
  • You can't adjust the audience split percentages after an experiment starts.
  • Currently, the experiments framework isn't cross-device aware, so a user could see one variant of the experiment on their mobile device and the baseline on their desktop.
  • The number of conversions counted may differ between experiments and other forms of reporting, including metrics displayed in tables. This is because metrics recorded during experiments only consider impressions served while the experiment was active. 

Frequently asked questions

What's the difference between Campaign Manager 360's audience segmentation targeting and experiments in Display & Video 360?

Audience segmentation targeting in Campaign Manager 360 is focussed on splitting traffic between different creatives, while experiments in Display & Video 360 give you the ability to split traffic among different insertion orders and line items, which makes it possible to test more things than just creatives. Campaign Manager 360's audience segmentation targeting divides the traffic of a Campaign Manager 360 campaign into different groups of users (i.e., "segments") and allows you to traffic a different creative for each segment. On the other hand, Display & Video 360's experiments split audiences at the insertion order or line item level, which lets you test any setting or targetable dimension.

Why can't I add a particular insertion order or line item to my experiment?

Any insertion orders or line items that can't be added to your experiment will be hidden from view or displayed as unselectable while you're setting up your experiment. If you see an insertion order or line item that can't be selected, hover over the question mark next to the name of the item to see the reason it can't be added to your experiment.

What's the difference between Google Optimize and experiments in Display & Video 360?

Experiments in Display & Video 360 are designed to compare advertising campaign tactics, such as targeting, settings, and so on), while Google Optimize is designed to compare different sites or landing pages. Additionally, Display & Video 360's experiments use a frequentist model (which is similar to most other ads effectiveness measurement solutions), while Google Optimize use a Bayesian model (which is better suited to manage low sample size scenarios).

Was this helpful?
How can we improve it?

Need more help?

Sign in for additional support options to quickly solve your issue

Clear search
Close search
Google apps
Main menu
Search Help Center