Search
Clear search
Close search
Google apps
Main menu

Help us improve Analytics Education. Tell us how you like to learn!

About data sampling

In data analysis, sampling is the practice of analysing a subset of all data in order to uncover the meaningful information in the larger data set. For example, if you wanted to estimate the number of trees in a 100-acre area where the distribution of trees was fairly uniform, you could count the number of trees in 1 acre and multiply by 100, or count the trees in a half acre and multiply by 200 to get an accurate representation of the entire 100 acres.

This article explains the circumstances under which Analytics applies session sampling to your data in order to give you accurate reports in a timely fashion.

 

In this article:

Sampling thresholds

Default reports are not subject to sampling.

Ad-hoc queries of your data are subject to the following general thresholds for sampling:

  • Analytics Standard: 500k sessions at the view level for the date range you are using
  • Analytics 360: 1M - 100M sessions at the view level for the date range you are using

    360 thresholds vary according to how queries are configured. For detailed information, contact your 360 support team.

When sampling is applied

The following sections explain where you can expect session sampling in Analytics reports.

Default reports

Analytics has a set of preconfigured, default reports listed in the left pane under Real-Time, Audience, Acquisition, Behavior, and Conversions.

Analytics stores one complete, unfiltered set of data for each property in each account. For each reporting view in a property, Analytics also creates tables of aggregated dimensions and metrics from the complete, unfiltered data. When you run a default report, Analytics queries the tables of aggregated data to quickly deliver unsampled results.

Default reports are unsampled in both Analytics Standard and Analytics 360.

Ad-hoc reports

If you modify a default report in some way, for example, by applying a segment, filter or secondary dimension, or if you create a custom report with a combination of dimensions and metrics that don’t exist in a default report, you are generating an ad-hocquery of Analytics data.

Analytics first goes to the aggregated data tables to see if all of the requested information from your ad-hoc query is available there. If the information is not available there, Analytics queries the complete, unfiltered set of data and computes new aggregates to satisfy the query request.

Ad-hoc queries are subject to sampling if the number of sessions for the date range you are using exceeds the threshold for your property type.

The sampling algorithm uses a sample of the complete data that is proportional to the daily distribution of sessions for the property for the date range you’re using. For example, if over a 5-day period sessions were sampled at 25%, then the sample would include 25% of each day’s sessions:

  Monday Tuesday Wednesday Thursday Friday
Total sessions 200,000 100,000 200,000 300,000 200,000
25% sample 50,000 25,000 50,000 75,000 50,000

 

The sampling rate varies query to query depending on the number of sessions during a date range for a given view.

When sampling is in effect, you see a message at the top of the report that says This report is based on N% of sessions.

To the right of that message, you can select one of two options to change the sampling size:

  • Greater precision: uses the maximum sample size possible to give you results that are the most precise representation of your full data set
  • Faster response: uses a smaller sampling size to give you faster results
Sampling controls: Greater precision or Faster response
Sampling controls.

Other reports

Sampling works differently for these reports than for default reports or ad-hoc queries.

Multi-Channel Funnel and Attribution reports

Like default reports, no sampling is applied unless you modify the report, for example, by changing the lookback window, by changing which conversions are included, or by adding a segment or secondary dimension. If you modify the report in any way, a maximum sample of 1M sessions will be returned.

Flow-visualization reports

Flow-visualization reports (Users Flow, Behavior Flow, Events Flow, Goal Flow) are generated from a maximum of 100K sessions for the selected date range.

The flow-visualization reports, including entrance, exit, and conversion rates may differ from the results in the default Behavior and Conversion reports, which are based on a different sample set.

Filters and segments

Analytics Standard and Analytics 360 sample session data at the view level, after filters have been applied. For example, if view filters include or exclude sessions, then the sample is taken from only those sessions.

Analytics Standard and Analytics 360 both apply segments after applying filters and after sampling, which means that a segment may include fewer sessions than are included in the overall sample.

Working with sample size

Use the controls to switch between the maximum sample size for a more precise report, or the smaller sample size for a faster response to your query.

One option to avoid sampling is to shorten the date range of your report until the number of sessions is under the sampling threshold, if your volume of data allows for that.

If you are a Google Analytics 360 user, you have 2 additional options to get unsampled reports:

Was this article helpful?
How can we improve it?
Google Analytics training and support resources

Check out our comprehensive list to learn more about Analytics solutions.