Run A/B tests on your store listing

To give developers more control and statistical robustness we've recently made some changes to store listing experiments in Play Console. We've added three new capabilities:

experiment parameter configuration;
a calculator to estimate samples needed and time to completion; and
confidence intervals that now allow for continual monitoring.

This page has been updated to reflect these changes.

The features described in this page are not yet available to all developers.

To help optimize your store listing on Google Play, you can run experiments to find the most effective graphics and localized text for your app. You can run experiments for your main and custom store listing pages.

For published apps, you can test variants against your current version to see which one performs best based on install data. Before setting up a test, review the best practices for running effective experiments.

Note: Global experiments have been renamed to default graphics experiments. They maintain the same functionality that they had prior to being renamed.

Experiment types

For each app, you can run one default graphics experiment or up to five localized experiments at the same time.

Default graphics

Using a default graphics experiment, you can experiment with graphics in your app's default store listing language. You can include variants of your app's icon, feature graphic, screenshots, and promo video.

If your app's store listing is only available in one language: Default graphics experiments will be shown to all users.
If you've added any localized graphic assets in a specific language: Users viewing your app in that language are excluded from your app's default graphics experiments. For example, if your app's default language is English and it has a localized feature graphic in French, users viewing your app in French will be excluded from the experiment (even if you're testing your icon).

Localized (for text and graphics)

Using a localized experiment, you can experiment with your app's icon, feature graphic, screenshots, promo video, and/or your app's descriptions in up to five languages. Experiment variants will only be shown to users viewing your app's store listing in the languages you choose.

If your app's store listing is only available in one language, localized experiments will only be shown to users viewing your app in its default language.

Step 1: Create an experiment

To create an experiment in Play Console:

Open Play Console and go to the Store listing experiments page (Grow > Store presence > Store listing experiments).
Click Create experiment.
Enter the experiment details:
- Experiment name: Enter the name of your experiment (50 characters or fewer).
- Store listing: Select the store listing you want to run this experiment on.
- Experiment type: Choose Default graphics experiment or Localized experiment.
Click Next.
Proceed to the instructions under "Step 2: Set up your experiment."

Step 2: Set up your experiment

After you've created an experiment, you can select the store listing, variants, and attributes that you want to test.

To set up your experiment:

Follow the on-screen instructions to select a store listing and add your experiment goals, such as your target metric, experiment audience, variants, and other settings.
- For more information and tips, review the table below.
To begin your experiment, go to the top of the page and click Run experiment.
To finish setting up your experiment later, click Save.

See field descriptions, examples and tips

You must provide the store listing information below when setting up your experiment.

Field	Description	Examples and tips
Experiment name	Your experiment name is only visible in Play Console to identify the experiment and isn't visible to users.	"Bright icon experiment" "Logo feature graphic" "Short description with new slogan"
Store listing	The store listing (main or custom) that you want to test.	After you start an experiment, your custom store listing's name is displayed with the name it had at the start of your experiment. If you update the name of your custom store listing after your experiment starts, it won't be changed in your experiment. To avoid confusion when running experiments, it's a good idea to delete unused custom store listings, and then create new ones when needed.

Field

Description

Examples and tips

Experiment name

Your experiment name is only visible in Play Console to identify the experiment and isn't visible to users.

"Bright icon experiment"
"Logo feature graphic"
"Short description with new slogan"

Store listing

The store listing (main or custom) that you want to test.

After you start an experiment, your custom store listing's name is displayed with the name it had at the start of your experiment. If you update the name of your custom store listing after your experiment starts, it won't be changed in your experiment.

To avoid confusion when running experiments, it's a good idea to delete unused custom store listings, and then create new ones when needed.

You must provide the experiment goals information below when setting up your experiment. These settings will affect the accuracy of your experiment, and how many installers are needed to reach a result.

Field	Description	Examples and tips
Target metric	The metric that will be used to determine the experiment result.	You have two options: Retained first-time installers (recommended): The number of users who installed the app for the first time, and kept it installed for at least 1 day First-time installers: The number of users who installed the app for the first time, regardless of whether they kept it
Variants	The number of experimental variants to test against the current store listing. Testing a single variant will mean it takes less time to complete your experiment.	To add a new version to your experiment, select another option from the Variants dropdown. You can add up to three variants per experiment, in addition to your current version.
Experiment audience	The percentage of store listing visitors that will see an experimental variant instead of your current listing. These visitors will be split equally across your experimental variants.	If you type 30% as your audience, the remaining 70% of visitors to your store listing page will see your page's current version. If you have a 30% audience and two variants in your experiment, each variant will be shown to 15% of users. During the course of an experiment, each user will only see a single variant or your page's current version.
Minimum detectable effect	The minimum difference between variants and control required to declare which performs better. If the difference is less than this, your experiment will be considered a draw.
Confidence level	How often the confidence interval provided by the experiment will contain the true performance of the store listing.	Increasing the confidence level will decrease the likelihood of a false positive. Note: Choosing 90% confidence level for your experiment means that one in ten experiments may report a false positive
Attributes	Select the item type that you want to test compared to your current listing.	To run experiments most effectively, test one attribute at a time. You can only test your short description and full description during a localized experiment. If you're testing graphic assets, make sure to follow size and file type requirements.

If you click Edit estimates when entering your experiment goals information, you can edit the estimated daily values listed below. These values will help calculate how long your experiment will take. They may differ from observed performance.

Field	Description	Examples and tips
Daily visits from new users	The estimated number of store listing visitors who haven't installed your app before
Conversion rate	The estimated percentage of visitors that will convert to your target metric.	If your target metric is first-time installers, for example, then the conversion rate will be the estimated percentage of visitors that will convert to first-time installers.
Daily retained first-time installers	The estimated number of users who will install your app for the first time and keep it installed after one day.	Go to See descriptions of statistics, metrics, and examples to learn more.
Daily first-time installers	The estimated number of users who will visit your store listing and install for the first time.	Go to See descriptions of statistics, metrics, and examples to learn more.

Step 3: Review and apply results

To review and apply your results:

Open Play Console and go to the Store listing experiments page (Grow > Store presence > Store listing experiments).
Select the right arrow on the table row of the experiment you want to review. View Store listing experiment tables below to learn more about how details are displayed.
In the "Results" section, view a summary of your experiment's results. The results are based on your target metric.

The results will tell you which of the options is performing the best, or performing better than, your current store listing. You'll also see an explanation of this result based on its confidence interval and minimum detectable effect, and a recommended action (if applicable). You may receive the result "More data needed" if not enough data has been collected to return a result.
Your next action depends on your experiment results:
- If a variant performed well, you may see a recommendation to apply that variant.
- If your result is "Leave your experiment running to collect more data," come back later.
- If a number of variants performed better than your store listing, or if your result was a "Draw," review the results to decide which variant you want to apply.
- If your current store listing performed the best, you should click Keep current listing.

Store listing experiment tables

After you’ve set up experiments, you'll see the following details on your Store listing experiments page. Experiment details are organized in four tables, as listed below. To get more details, select the right arrow on the table row of the experiment you want to view.

Drafts

Experiment name: Name of your experiment
Store listing: The store listing the experiment is running on
Experiment type: Default graphics or localized
Details: Number of variants being served to the percentage of target users

In progress

Experiment name: Name of your experiment
Store listing: The store listing the experiment is running on
Experiment type: Default graphics or localized
Start date: When your experiment started
Details: Number of variants being served and the percentage of users

Completed

Experiment name: Name of your experiment
Store listing: The store listing the experiment is running on
Experiment type: Default graphics or localized
Start date: When your experiment started
Details: Number of variants being served and the percentage of users
Outcome: The result of your experiment

Past experiments

Experiment name: Name of your experiment
Store listing: The store listing the experiment is running on
Experiment type: Default graphics or localized
Start date: When your experiment started
End date: When your experiment ended
Status: Applied or not applied

See descriptions of statistics, metrics, and examples

After you select an experiment, you can view user and install metrics that summarize how each variant performed.

User metrics

Metric	Definition
First-time installers	Unique users who installed your app for the first time during the experiment. Data is scaled to account for audience share.
Retained installers (1-day)	Unique users who installed your app for the first time and kept it for at least 1 day following installation during the experiment. Data is scaled to account for audience share.
Retained pre-registrations – only available for for select Play partners	The number of users who pre-registered and remained pre-registered for at least 1 day.

Results

You can view metrics that show the results of your experiment in two different ways:

Current: Number of unique users
Scaled: Number of unique users divided by audience share

If you want to review absolute data on installers, use current data. If you want to review data that’s been scaled to account for different audience shares (e.g., If 90% of your audience saw one version and 10% of your audience saw another), use scaled data.

Item	Definition and examples
Performance	Estimated change in install performance compared to the current version. Performance will only be displayed once your experiment has enough data. In general, as an experiment has more time to run and collect data, a variant's performance range will become more narrow and accurate. Example: If one of your variants had a performance range of +5% to +15%, then the change lies somewhere between these values, with the most likely change in performance being the middle number between the two, about +10%.
Installs (scaled)	Number of installs during the experiment divided by audience share. Included for experiments started before January 24, 2019. For example, if you ran an experiment with two variants that used 90%/10% audience shares and the installs for each variant were A = 900 and B = 200, the scaled installs would be shown as A = 1000 (900/0.9) and B = 2000 (200/0.1).
Installs (current)	Number of installs that are still installed today. Included for experiments started before January 24, 2019.

Item

Definition and examples

Performance

Estimated change in install performance compared to the current version. Performance will only be displayed once your experiment has enough data. In general, as an experiment has more time to run and collect data, a variant's performance range will become more narrow and accurate.

Example: If one of your variants had a performance range of +5% to +15%, then the change lies somewhere between these values, with the most likely change in performance being the middle number between the two, about +10%.

Installs (scaled)

Number of installs during the experiment divided by audience share.
Included for experiments started before January 24, 2019.
For example, if you ran an experiment with two variants that used 90%/10% audience shares and the installs for each variant were A = 900 and B = 200, the scaled installs would be shown as A = 1000 (900/0.9) and B = 2000 (200/0.1).

Installs (current)

Number of installs that are still installed today.
Included for experiments started before January 24, 2019.

Install metrics (deprecated)

The following metrics were used for experiments started before January 24, 2019. Experiments that started before this date still include these statistics, but new experiments only use first-time installers and retained installers (1-day) user metrics.

Metric	Definition and examples
Installs on active devices	Number of active devices on which each variant of the application is currently installed, scaled up to compensate for the different audience levels.
Installs by user	Number of unique users who installed each variant of the app in a given day, scaled up to compensate for different audience levels.
Uninstalls by user	Number of uninstalls of each variant of the app in a given day, scaled up to compensate for different audience levels.

Experiments using deprecated metrics

As of September 2019, experiments using any of the deprecated metrics listed above will be automatically terminated. Please factor this into your planning and instead, use the latest metrics for store listing experiments.

You can view results from terminated experiments under Terminated experiments on the Store Listing experiments page.

Sign up for experiment notifications

To receive notifications and email when your experiments are complete, make sure to set your preferences on the Notifications page in Play Console.

To learn more about email notifications, go to manage your developer account information.

Was this helpful?

How can we improve it?