Test ads with experiments

Display and Video 360 experiments is an integrated testing framework that helps you start A/B testing on an individual or a group of creatives, audiences, bid strategies, or targeting tactics by comparing insertion orders or line items.

You can categorize users into mutually exclusive groups and experiments to test different combinations for targeting, settings, and creatives to discover which perform best. Use these findings to optimize the performance of campaigns mid-flight or for planning future campaigns.

With experiments, you can:

Test every variable dimension affecting a campaign, including targeting, settings, creative, and more.
Report on key metrics, such as CPC, CTR, CPA, CVR, CPM, and so on.

User-based identification

Display & Video 360 uses user-based identifiers to help your experiments adapt to third-party deprecation by using backup identifiers for diversion when a third-party ID isn’t available. This increases the chances of having one identifier available for any one ad impression.

For example:

If a third-party ID isn’t available, the first-party is used. Otherwise, a query level identifier is used.

By default, experiments use user-based identification and random diversion to maximize participation. You can exclude unidentified users by filtering traffic without third-party IDs to reduce cross-arm contamination, but this would decrease the number of users participating in your experiment.

Concepts

Actual values

The raw results from the experiment. Represents the actual number of conversions the variant received.

Arm

An arm may contain:

An individual line item or insertion order or

A group of line items or insertion orders

For example, a baseline and its variant are separate arms of the experiment.

Baseline

The line item or insertion order that controls the standard for comparisons in the experiment. You can make variants and compare them against the original baseline for testing.

Confidence interval

This indicates the level of certainty that the actual difference between variants falls between the reported range. You can specify a probability of 90% or 95%that the true value lies within the reported range.

For example, a 90% confidence interval should show that 90 of 100 repeated tests have a difference that falls within the reported range.

Normalized values

Represents the top value for clicks, conversions, impressions, or revenue calculated by scaling up the baseline or variant's actual value to 100% audience split.

For example, if there are 170 000 actual conversions for a line item with a 34% audience split, the normalized value is 500 000 conversions if the same line item received 100% of the audience split.

P-value

Represents the calculated probability that the difference could have occurred by chance.

Used in determining the result's statistical significance to test the probability that there's a real performance difference between the baseline and variant:

The lower p-value indicates stronger evidence of a performance difference signaling significant results.

The higher p-value indicates that the results may have been by chance signaling the results are not significant.

Variant

An experimental line item or insertion order testing a single variable relative to the baseline line item or insertion order.

Set up an experiment

From your advertiser click Experiments in the left menu.
Under the A/B tests tab, select Create new.
Choose one of the following:
- Cross Exchange
- YouTube & Partners
Enter the following details:
1. Name: Enter an identifier for your experiment and (optional) a hypothesis statement.
2. Test duration: Start and end dates:
  - Start date: Set when the experiment starts. The date must be after the current date.
  - End date: Optionally set when the experiment ends. If you don't specify an end date, your experiment runs indefinitely.
    When possible, coincide the experiment's start and end dates to match the experiment's insertion orders or line items.
    
    For example:
    
    If you stop an experiment before the insertion orders or line items have reached their end dates, the experiment won't follow the audience split and serve to 100% of users.
    
    Metrics in an experiment are based solely on impressions served after the experiment's start date. Reports may count a different number of conversions if the line item was active before or after the dates of the experiment.
3. Test arms: Choose if you're comparing insertion orders or line items.
  - Depending on your experiment type, pick at least two insertion orders or line items to use in the experiment.
    1. Compare individually: Select individual insertion orders to include in the experiment.
      - If using multiple insertion orders, you can adjust the audience split to control the distribution of all cookies in your experiment among your experiment's insertion orders or line items.
      - For multiple insertion orders, you can identify the control arm by setting the insertion order as the Baseline.
    2. Compare groups: Select groups of insertion orders to include in each arm of the experiment.
      - You can adjust the audience split to control the distribution of all cookies in your experiment among your experiment's insertion orders or line items.
      - You can identify the control arm by setting the insertion order as the Baseline.
4. Measure:
  1. Research goals: Select the goal you want to measure in the experiment.
    - Conversions
    - Clicks
    - Completed video views
    - Total custom impression value (when using custom bidding)
  2. Confidence interval: Choose 95% (the most common) or 90%.
  3. Participation (cross-exchange experiments only): By default, this is set to maximize participation in the experiment by using user-level identifiers and random diversion.
    - You can turn on Exclude unidentified users to exclude traffic without third-party IDs to minimize cross-arm contamination.
      
      Note: Excluding unidentified users may cause your experiment to be non-representative due to the decrease in participation.
  4. Lift study set up (Optional and YouTube & partners experiments only):
    - Select the checkbox for "Brand lift".
    - Select 1 to 3 lift metrics to measure. The left sidebar will show the eligibility criteria for measuring lift and if your experiment arms are eligible. Learn more about setting up brand lift measurement.
    - Enter the following survey details:
      - Your brand or product name.
      - Up to 3 names of competing brands or products.
    - Enter the following survey settings:
      - Language: The language used for the survey.
      - Object type: The industry or field that you want to survey.
      - Intended action: What you expect the user to do after seeing your ad.
Click Save.

Evaluate the results of an experiment

Start from your advertiser.
In the left menu, go to Experiments.
Under the A/B tests tab: Select the Study name link to view the results of an experiment.
1. If you set up a Brand Lift survey in an experiment, click View brand lift report next to the test arm name to see the results of your study.
Under the experiment's Results tab:
1. Under Conversions (Primary goal): You can view the results summary, including a graph to understand the difference between your baseline, variants, and lift:
  1. Metric: You can evaluate the difference between your baseline and variants to check for statistical significance.
  2. Status: Indicates if the results are statistically significant or not. A statistically significant result is when there's a large difference between the baseline and any variant of the experiment's goal.
  3. Test dates: The dates you've set for the experiment.
  4. Type: Reflects if you've chosen to compare insertion orders or line items.
  5. Confidence level: The confidence level you've set for the experiment.
  6. Confidence interval: Applies the confidence level set for the experiment when turned on.
You can update your results in the following ways:
- Select a baseline: By default, the chart compares the baseline to multiple variants. You can select a variant to use as the baseline from the Baseline list.
- Select an attribution model: When viewing results for conversion experiments, you can choose an attribution model from the Attribution Models list.
Optionally, you can set up two independent brand lift studies as your experiment arms and view the brand lift results. To view brand lift results in Experiments:

If you have brand lift studies that are created for insertion orders within an experiment, then the brand lift study is automatically set to accelerated measurement. The brand lift study will try to collect survey responses as soon as possible and stop when it reaches the target number of responses.
- The brand lift studies and experiment start dates must be the same.
- The metric selection and survey questions must be the same.
- For example, with two campaigns: You can run a brand lift study for each campaign and create an experiment with two arms representing each campaign. If the brand lift study is complete, then you can view your results even if the experiment is still running.

Review the differences of an experiment

Navigate to the Diff tab to review the differences between the branches of an experiment. This allows you to see if the only difference is the variable you are testing with. You can correct any differences before the experiment goes live, removing potential bias or the risk of experiment providing irrelevant results. When variant arms have more than one line item, Display & Video 360 will auto-match comparisons based on the minimum number of differences observed.

Be aware that the Diff tool is meant for use during the QA process, but it may not be useful retroactively. The Diff tool compares line items and insertion orders as they are currently, not as they were when the experiment ran. So the Diff tool will reflect any changes that occurred after the experiment (including archiving line items), even though these changes didn't affect the experiment.

Best practices

Keep in mind the following when planning an experiment.

Planning and setup

Only test 1 variable per experiment. Keep all arms of the experiment (baseline and any variants) the same, except for a single variable that you're testing.
Create insertion orders or line items for your experiments by duplicating them, rather than creating them from scratch. This makes it easier to ensure that the items in your experiments are identical, except for the single dimension you're testing as a variable.
Only use new insertion orders or line items in your experiments. If an insertion order or line item has previous activity outside of the experiment, this may impact conversion counting.
Eliminate outside influences. Make sure your line items outside of your experiment aren't competing with the budgets of the line items in your experiment. So, if possible, use a separate insertion order for any line items that will be used in a given experiment.

Additionally, if possible, try not to use reuse the creative you're using in an experiment anywhere outside of your experiment.
Set a sufficiently high frequency cap. If you're using insertion orders in your experiment, make sure your campaign’s frequency cap is at least as high as the sum of the highest frequency cap of any insertion order participating in the experiment plus all of the frequency caps of the insertion orders remaining that aren't used in the experiment.

For example, if you have a campaign that contains 3 insertion orders, but only 2 are part of an experiment, you'd determine your campaign's minimum frequency cap by adding the highest frequency cap between the two participating insertion orders to the frequency cap of the insertion order that isn't being used in the experiment. So, if the insertion orders in the experiment have frequency caps of 10 and 8 respectively, and the third insertion order, which isn't a part of the experiment, has a frequency cap of 5, the campaign should have a frequency cap of 15. This is determined by adding 10 (which is the highest frequency cap of any insertion order associated with the experiment) to 5 (which is the sum of all of the frequency caps of the insertion orders remaining in the campaign that are outside of the experiment.

This same best practice applies to the insertion order-level frequency cap if your experiment is comparing line items.
Plan your budget and pacing deliberately. The budget set for each arm of your experiment should be proportional to your experiment's audience split. If you allocate budget differently and not in proportion you are making budget part of the experiment variables. Similarly pacing should be the same or it too would be another variable in the experiment. This best practice should be extended beyond line items within an experiment, and also to other line items not in the experiment but within the same insertion order . Their ability to spend budget and pace will affect how the experiment line items buy inventory and thus will influence the results.
Be careful when you have limited reach. If you anticipate having a relatively limited reach (for example, you're buying deal inventory or audience inventory with a limited reach), experiments may produce wide confidence intervals, which may make it difficult to evaluate the efficacy of your variants.
Finalize things ahead of time. Experiments should have sufficient time for all creatives to be approved before they start.

While an experiment is running

Don't pause the experiment. If you need to temporarily pause a campaign, but you plan on continuing your experiment, pause the arms of the experiment (making sure to pause all of them), but not the experiment itself. Then, when you resume the campaign, make sure to activate all of the branches at the same time.

If you end an experiment, it can’t be restarted. Additionally, all entities assigned to the experiment will go back to serving across 100% of your users.
Make uniform edits. To change to your insertion orders or line items while an experiment is running, make sure you apply the same change to all arms of the experiment. You may need to do this to remove a site that doesn't meet brand suitability guidelines.

Considerations

Experiments can't be run on the following types of inventory:
- Programmatic guaranteed default line items or insertion orders with default line items
- Instant Reserve inventory
The earliest the experiment start date can be set for is 24 hours after the initial setup.
A line item or insertion order can only be used in a single active experiment at a given time.
You can't adjust the audience split percentages after an experiment starts.
Currently, the experiments framework isn't cross-device aware, so a user could see one variant of the experiment on their mobile device and the baseline on their computer.
The number of conversions counted may differ between experiments and other forms of reporting, including metrics displayed in tables. This is because metrics recorded during experiments only consider impressions served while the experiment was active.
Lift studies created in A/B tests are not available for remeasurement. If you want to remeasure your brand lift study, you need to stop the experiment, remove the insertion order from the brand lift study, and create a new brand lift study in the “Lift studies” tab.

Frequently asked questions

What's the difference between Campaign Manager 360's audience segmentation targeting and experiments in Display & Video 360?

Audience segmentation targeting in Campaign Manager 360 focuses on splitting traffic between different creatives. For example, with audience segmentation targeting, you can divide the traffic of a Campaign Manager 360 campaign into different groups of users and traffic a different creative for each segment.

Experiments in Display & Video 360 lets you split traffic at the insertion order or line item level, which can test any setting or targetable dimension beyond just creatives.

Why can't I add a particular insertion order or line item to my experiment?

Insertion orders or line items unavailable to your experiment are either hidden from view or are shown as an unselectable item when you're setting up your experiment.

You may be able to determine the reason why you're unable to add an insertion order or line item to an experiment by using the tooltip icon.

What's the difference between Google Optimize and experiments in Display & Video 360?

Experiments in Display & Video 360 lets you compare advertising campaign tactics such as targeting and settings, while Google Optimize lets you compare different sites or landing pages.

Experiments in Display & Video 360 uses a frequentist model similar to other performance measurement solutions for advertisements, while Google Optimize uses a Bayesian model that's better suited to manage low sample size comparisons.

Was this helpful?

How can we improve it?