Introduction to aggregation
Aggregation is the process of reducing and summarizing tabular data. For example, consider the list of numbers below:
100, 200, 300, 400, 500
Using this example, you can state the following facts that illustrate the concept of aggregation:
|There are 5 numbers.||Count|
|The smallest number is 100||Minimum|
|The largest number is 500||Maximum|
|The average of the numbers is 300||Average|
|The sum of the numbers is 1500||Sum|
There are other ways you can aggregate data, including calculating the median, count distinct, quartiles, percentiles, etc.
Dimensions and aggregation
The example above is based on a single set of numbers, but that’s almost never what you see in the real world. In the real world, your data is typically organized into dimensions and metrics. Dimensions provide a way to categorize and group your data, while metrics measure that data.
In Data Studio, aggregation always takes place in the context of a set of dimensions. That set of dimensions can be:
- Every dimension, which lets you see the raw data.
- A subset of your dimensions, which lets you see data broken down (grouped) by the dimensions you select.
- The empty set, which lets you see a summary of the entire data set.
For example, here are the same five numbers presented as daily stock prices. Date and Ticker are dimensions, Price is a metric.
You can now use the dimensions to group the data in different ways. For example:
In the examples above, the metric values depend on what you want to know about the data. For example, to calculate the average price for each company, apply the Average aggregation in conjunction with the Ticker dimension:
|Ticker||Average of Price|
|GOOG||(100 + 300 + 500) / 3 = 300|
|AAPL||(200 + 400) / 2 = 300|
To see how many companies had stock information on a given day, you would use the Date dimension and the Count aggregation:
|Date||Count of Price|
Now, consider what happens when you use both the Date and Ticker dimensions in the stock table:
When a you create a group that includes all the available dimensions, the result is identical to the original data. This is still an aggregation, just not an interesting one, since every aggregation gives the same result. In Data Studio, if you make a table and show every dimension, you’re still able to set the aggregation type but it won’t do anything.
Some data sources, such as Analytics and Ads, show Auto as the only available aggregation type for metrics. These data sets are already aggregated, reducing a potentially massive list of numbers to single values. The aggregations used can be simple, like sum or count, or more complex, like percentile. You can't change Auto aggregated metrics to a different aggregation type, since doing so would require Data Studio to have access to the raw, unaggregated data.
You'll also see Auto aggregation applied to calculated fields based on other data sources, such as Sheets. In this case, the aggregation is a direct result of using an aggregation function in the formula, such as
AVG(Price). Again, you can't change the Auto aggregation here, because that would conflict with the output of the AVG function.