Test ads with experiments

Display and Video 360 experiments is an integrated testing framework that helps you start A/B testing on an individual or a group of creatives, audiences, bid strategies, or targeting tactics by comparing insertion orders or line items.

You can split cookies into mutually exclusive groups and experiments to test different combinations for targeting, settings, and creatives to discover which perform best. Use these findings to optimize the performance of campaigns mid-flight or for planning future campaigns.

With experiments, you can:

  • Test every variable dimension affecting a campaign, including targeting, settings, creative, and more.
  • Report on key metrics, such as CPC, CTR, CPA, CVR, CPM, and so on.

Concepts

Baseline

The line item or insertion order that controls the standard for comparisons in the experiment. You can make variants and compare them against the original baseline for testing.

Variant

An experimental line item or insertion order testing a single variable relative to the baseline line item or insertion order.

Arm

An arm may contain:

  • An individual line item or insertion order or
  • A group of line items or insertion orders

For example, a baseline and its variant are separate arms of the experiment.

Actual values

The raw results from the experiment. Represents the actual number of conversions the variant received.

Normalized values

Represents the top value for clicks, conversions, impressions, or revenue calculated by scaling up the baseline or variant's actual value to 100% audience split.

For example, if there are 170 000 actual conversions for a line item with a 34% audience split, the normalized value is 500 000 conversions if the same line item received 100% of the audience split.

Confidence interval

This indicates the level of certainty that the actual difference between variants falls between the reported range. You can specify a probability of 90% or 95% that the true value lies within the reported range.

For example, a 90% confidence interval should show that 90 of 100 repeated tests have a difference that falls within the reported range.

P-value

Represents the calculated probability that the difference could have occurred by chance.

Used in determining the result's statistical significance to test the probability that there's a real performance difference between the baseline and variant:

  • The lower p-value indicates stronger evidence of a performance difference signaling significant results.
  • The higher p-value indicates that the results may have been by chance signaling the results are not significant.

Set up an experiment

  1. Start from your advertiser.
  2. In the left menu, go to: Resources > Experiments and Lift.
  3. Under the Experiments tab, select Create new.
  4. Choose one of the following:
    • Cross Exchange
    • YouTube & Partners
  5. Enter the following details:
    1. Experiment name: Enter an identifier for your experiment.

    2. Type: Choose if you're comparing insertion orders or line items.

    3. Start and end dates: 

      • Start date: Set when the experiment starts. The date must be after the current date. 

      • End date: Optionally set when the experiment ends. If you don't specify an end date, your experiment runs indefinitely.

        When possible, coincide the experiment's start and end dates to match the experiment's insertion orders or line items.

        For example:

        • If you stop an experiment before the insertion orders or line items have reached their end dates, the experiment won't follow the audience split and serve to 100% of users.

        Metrics in an experiment are based solely on impressions served after the experiment's start date. Reports may count a different number of conversions if the line item was active before or after the dates of the experiment.

    4. Measure:

      1. Research goals: Select the goal you want to measure in the experiment.
        • Conversions
        • Clicks
        • Completed video views
        • Total custom impression value (when using custom bidding)
      2. Confidence interval: Choose 95% (the most common) or 90%.
      3. For cross-exchange experiments only:
        • You can choose to turn on include users that we don't have cookies or other ID information for.
          Note: Consider leaving this option off by default unless you intend to capture unidentified traffic. This option can introduce additional noise to your A/B groups, and your randomized sample groups can represent the population without this added traffic.
           
          Unidentified users increase the experiment's overall unique users and their environment types. Unidentified impressions not identified by cookies or IDs are split evenly into the experiment groups, which may contaminate your A/B group.
    5. Participants: Depending on your experiment type, pick at least two insertion orders or line items to use in the experiment.

      1. Compare individually: Select individual insertion orders to include in the experiment.
        • If using multiple insertion orders, you can adjust the audience split to control the distribution of all cookies in your experiment among your experiment's insertion orders or line items.
        • For multiple insertion orders, you can identify the control arm by setting the insertion order as the Baseline.
      2. Compare groups: Select groups of insertion orders to include in each arm of the experiment.
        • You can adjust the audience split to control the distribution of all cookies in your experiment among your experiment's insertion orders or line items.
        • You can identify the control arm by setting the insertion order as the Baseline.
  6. Choose Save.

Evaluate the results of an experiment

  1. Start from your advertiser.
  2. In the left menu, go to: Resources > Experiments and Lift.
  3. Under the Experiments tab: Select the Study name link to view the results of an experiment.
  4. Under the experiment's Results tab:

    1. Under Conversions (Primary goal): You can view the results summary, including a graph to understand the difference between your baseline, variants, and lift:

      1. Metric: You can evaluate the difference between your baseline and variants to check for statistical significance in the following metrics:
        • Conversions
        • Conversion rate
        • Cost per conversion (CPA)
        • Revenue
      2. Status: Indicates if the results are statistically significant or not. A statistically significant result is when there's a large difference between the baseline and any variant of the experiment's goal.
      3. Experiment Dates: The dates you've set for the experiment.
      4. Type: Reflects if you've chosen to compare insertion orders or line items.
      5. Confidence level: The confidence level you've set for the experiment.
      6. Confidence interval: Applies the confidence level set for the experiment when turned on. 
  5. You can update your results in the following ways:

    • Select a baseline: By default, the chart compares the baseline to multiple variants. You can select a variant to use as the baseline from the Baseline list.

    • Select an attribution model: When viewing results for conversion experiments, you can choose an attribution model from the Attribution Models list.

  6. Optionally, you can set up two independent brand lift studies as your experiment arms and view the brand lift results. To view brand lift results in Experiments:

    • The brand lift studies and experiment dates must be the same and
    • The metric selection and survey questions must be the same.
    • For example, with two campaigns: You can run a brand lift study for each campaign and create an experiment with two arms representing each campaign. To view the brand lift results in Experiments, you must set the brand lift study dates and experiment dates as the same dates and have the two studies use the same metrics and questions.

    Overlapping brand lift experiments for insertion orders within the experiment automatically sets the brand lift study to accelerated measurement. The brand lift study will aim to collect survey responses as soon as possible and automatically stop when it reaches the target number of responses.

Review the differences of an experiment

Navigate to the Diff tab to review the differences between the branches of an experiment. This allows you to see if the only difference is the variable you are testing with. You can correct any differences before the experiment goes live, removing potential bias or the risk of experiment providing irrelevant results. When variant arms have more than one line item, Display & Video 360 will auto-match comparisons based on the minimum number of differences observed.

Be aware that the Diff tool is meant for use during the QA process, but it may not be useful retroactively. The Diff tool compares line items and insertion orders as they are currently, not as they were when the experiment ran. So the Diff tool will reflect any changes that occurred after the experiment (including archiving line items), even though these changes didn't affect the experiment.

Best practices

Keep in mind the following when planning an experiment.

Planning and setup
  • Only test 1 variable per experiment. Keep all arms of the experiment (baseline and any variants) the same, except for a single variable that you're testing.

  • Create insertion orders or line items for your experiments by duplicating them, rather than creating them from scratch. This makes it easier to ensure that the items in your experiments are identical, except for the single dimension you're testing as a variable.

  • Only use new insertion orders or line items in your experiments. If an insertion order or line item has previous activity outside of the experiment, this may impact conversion counting.

  • Eliminate outside influences. Make sure your line items outside of your experiment aren't competing with the budgets of the line items in your experiment. So, if possible, use a separate insertion order for any line items that will be used in a given experiment.

    Additionally, if possible, try not to use reuse the creative you're using in an experiment anywhere outside of your experiment.

  • Set a sufficiently high frequency cap. If you're using insertion orders in your experiment, make sure your campaign’s frequency cap is at least as high as the sum of the highest frequency cap of any insertion order participating in the experiment plus all of the frequency caps of the insertion orders remaining that aren't used in the experiment.

    For example, if you have a campaign that contains 3 insertion orders, but only 2 are part of an experiment, you'd determine your campaign's minimum frequency cap by adding the highest frequency cap between the two participating insertion orders to the frequency cap of the insertion order that isn't being used in the experiment. So, if the insertion orders in the experiment have frequency caps of 10 and 8 respectively, and the third insertion order, which isn't a part of the experiment, has a frequency cap of 5, the campaign should have a frequency cap of 15. This is determined by adding 10 (which is the highest frequency cap of any insertion order associated with the experiment) to 5 (which is the sum of all of the frequency caps of the insertion orders remaining in the campaign that are outside of the experiment.

    This same best practice applies to the insertion order-level frequency cap if your experiment is comparing line items.

  • Plan your budget and pacing deliberately. The budget set for each arm of your experiment should be proportional to your experiment's audience split. If you allocate budget differently and not in proportion you are making budget part of the experiment variables. Similarly pacing should be the same or it too would be another variable in the experiment. This best practice should be extended beyond line items within an experiment, and also to other line items not in the experiment but within the same insertion order . Their ability to spend budget and pace will affect how the experiment line items buy inventory and thus will influence the results.

  • Be careful when you have limited reach. If you anticipate having a relatively limited reach (for example, you're buying deal inventory or audience inventory with a limited reach), experiments may produce wide confidence intervals, which may make it difficult to evaluate the efficacy of your variants.

  • Finalize things ahead of time. Experiments should have sufficient time for all creatives to be approved before they start.

While an experiment is running
  • Don't pause the experiment. If you need to temporarily pause a campaign, but you plan on continuing your experiment, pause the arms of the experiment (making sure to pause all of them), but not the experiment itself. Then, when you resume the campaign, make sure to activate all of the branches at the same time.

    If you end an experiment, it can’t be restarted. Additionally, all entities assigned to the experiment will go back to serving across 100% of your users.
  • Make uniform edits. To change to your insertion orders or line items while an experiment is running, make sure you apply the same change to all arms of the experiment. You may need to do this to remove a site that doesn't meet brand suitability guidelines.

Considerations

  • Experiments can't be run on the following types of inventory:
    • Programmatic guaranteed default line items or insertion orders with default line items
  • The earliest the experiment start date can be set for is 24 hours after the initial setup.
  • A line item or insertion order can only be used in a single active experiment at a given time.
  • You can't adjust the audience split percentages after an experiment starts.
  • Currently, the experiments framework isn't cross-device aware, so a user could see one variant of the experiment on their mobile device and the baseline on their computer.
  • The number of conversions counted may differ between experiments and other forms of reporting, including metrics displayed in tables. This is because metrics recorded during experiments only consider impressions served while the experiment was active.

Frequently asked questions

What's the difference between Campaign Manager 360's audience segmentation targeting and experiments in Display & Video 360?

Audience segmentation targeting in Campaign Manager 360 focuses on splitting traffic between different creatives. For example, with audience segmentation targeting, you can divide the traffic of a Campaign Manager 360 campaign into different groups of users and traffic a different creative for each segment.

Experiments in Display & Video 360 lets you split traffic at the insertion order or line item level, which can test any setting or targetable dimension beyond just creatives. 

Why can't I add a particular insertion order or line item to my experiment?

Insertion orders or line items unavailable to your experiment are either hidden from view or are shown as an unselectable item when you're setting up your experiment.

You may be able to determine the reason why you're unable to add an insertion order or line item to an experiment by using the tooltip icon.

What's the difference between Google Optimize and experiments in Display & Video 360?

Experiments in Display & Video 360 lets you compare advertising campaign tactics such as targeting and settings, while Google Optimize lets you compare different sites or landing pages.

Experiments in Display & Video 360 uses a frequentist model similar to other performance measurement solutions for advertisements, while Google Optimize uses a Bayesian model that's better suited to manage low sample size comparisons.

Was this helpful?
How can we improve it?

Need more help?

Sign in for additional support options to quickly solve your issue

Search
Clear search
Close search
Google apps
Main menu
Search Help Center
true
69621
false