About data sampling
In data analysis, sampling is the practice of analyzing a subset of all data in order to uncover the meaningful information in the larger data set. For example, if you wanted to estimate the number of trees in a 100-acre area where the distribution of trees was fairly uniform, you could count the number of trees in 1 acre and multiply by 100, or count the trees in a half acre and multiply by 200 to get an accurate representation of the entire 100 acres.
This article explains the circumstances under which Analytics applies session sampling to your data in order to give you accurate reports in a timely fashion.
In this article:
Default reports are not subject to sampling.
Ad-hoc queries of your data are subject to the following general thresholds for sampling:
- Analytics Standard: 500k sessions at the property level for the date range you are using
- Analytics 360: 100M sessions at the view level for the date range you are using
360 thresholds vary according to how queries are configured. For detailed information, contact your 360 support team.
When sampling is applied
The following sections explain where you can expect session sampling in Analytics reports.
Analytics has a set of preconfigured, default reports listed in the left pane under Audience, Acquisition, Behavior, and Conversions.
Analytics stores one complete, unfiltered set of data for each property in each account. For each reporting view in a property, Analytics also creates tables of aggregated dimensions and metrics from the complete, unfiltered data. When you run a default report, Analytics queries the tables of aggregated data to quickly deliver unsampled results.
Analytics periodically adds new reports, and sometimes makes changes to the way metrics are calculated. If the date range of a report includes a time before the report was added or before a metric calculation changed, then Analytics can issue an ad-hoc query and the data might be sampled.
Data is sampled when reports that include the Users and Active Users metrics include data from before September 2016. Learn more
Default reports are unsampled in both Analytics Standard and Analytics 360.
If you modify a default report in some way, for example, by applying a segment, filter or secondary dimension, or if you create a custom report with a combination of dimensions and metrics that don’t exist in a default report, you are generating an ad-hoc query of Analytics data.
Analytics first goes to the aggregated data tables to see if all of the requested information from your ad-hoc query is available there. If the information is not available there, Analytics queries the complete, unfiltered set of data to satisfy the query request.
Ad-hoc queries are subject to sampling if the number of sessions for the date range you are using exceeds the threshold for your property type.
The sampling algorithm uses a sample of the complete data that is proportional to the daily distribution of sessions for the property for the date range you’re using. For example, if over a 5-day period sessions were sampled at 25%, then the sample would include 25% of each day’s sessions:
The sampling rate varies query to query depending on the number of sessions during a date range for a given view.
When sampling is in effect, you see a message at the top of the report that says This report is based on N% of sessions.
To the right of that message, you can select one of two options to change the sampling size:
- Greater precision: uses the maximum sample size possible to give you results that are the most precise representation of your full data set
- Faster response: uses a smaller sampling size to give you faster results
Sampling works differently for these reports than for default reports or ad-hoc queries.
Multi-Channel Funnel and Attribution reports
Like default reports, no sampling is applied unless you modify the report, for example, by changing the lookback window, by changing which conversions are included, or by adding a segment or secondary dimension. If you modify the report in any way, a maximum sample of 1M conversions will be returned.
Flow-visualization reports (Users Flow, Behavior Flow, Events Flow, Goal Flow) are generated from a maximum of 100K sessions for the selected date range.
The flow-visualization reports, including entrance, exit, and conversion rates may differ from the results in the default Behavior and Conversion reports, which are based on a different sample set.
Filters and segments
Analytics Standard and Analytics 360 sample session data at the view level, after view filters have been applied. For example, if view filters include or exclude sessions, then the sample is taken from only those sessions.
Analytics Standard and Analytics 360 both apply segments after applying report filters and after sampling, which means that a segment may include fewer sessions than are included in the overall sample.
Working with sample size
Use the controls to switch between the maximum sample size for a more precise report, or the smaller sample size for a faster response to your query.
One option to avoid sampling is to shorten the date range of your report until the number of sessions is under the sampling threshold, if your volume of data allows for that.
If you are a Google Analytics 360 user, you have 2 additional options to get unsampled reports:
- For single-use reports, you can download an unsampled report.
- For ongoing reporting, you can build a custom table.