Search
Clear search
Close search
Google apps
Main menu

Conditional calculated fields

Learn how to create conditional metrics with CASE and calculated field functions.

This article shows how to use the CASE statement to create "conditional calculated fields" in a Data Studio data source. A conditional calculated field gets its value dynamically, based on the results of combining CASE with an aggregation function.

This article is intended for advanced users of Data Studio. If you are new to creating data sources, using calculated fields in general, or CASE in particular, you can review the topics in the related resources section at the end of this article.

In this article:

Overview

A CASE statement returns a value by evaluating logical conditions. The basic syntax for a CASE statement is:

CASE 
    WHEN condition THEN result
    ELSE result 
END

While useful, CASE has a significant limitation: it can't mix dimensions and metrics in the same statement. If you try, you'll get an error in Data Studio  saying, "An expression can have either metrics, or dimensions, but not both."
 
You can work around this limitation by creating conditional calculated fields that "wrap" the CASE statement within an aggregation function. For example:

SUM(CASE WHEN condition THEN 1 ELSE 0 END)

This technique can be useful when you want to create simple conditional summaries based on a dimension. It's also possible to create more sophisticated conditional calculations. For example:

metric * MAX(CASE WHEN condition THEN 1 ELSE 0 END)

This technique is only possible with data sources that do not pre-aggregate metrics. The rest of this article explains these techniques and limitations.

Why would I want a conditional calculated field?

Using conditional calculated field lets you report on KPIs (key performance indicators) that require if...then logic to determine their value. Data Studio's CASE statement provides that conditional logic, but since a CASE statement can't mix dimensions and metrics, its usefulness is limited for this type of problem on its own. By using the techniques described in this article, it's possible to work around that limitation.

Let's try a simple example and create a metric that represents the total population over time for all countries whose name starts with 'A'. Suppose you have a Google Sheet that contains all the world's countries, and their population totals over the years. A data source created from that sheet will look like this:

 

Field Name Field Type Aggregation
Year Date (YYYY) NONE
Country Text NONE
Population Number SUM

 

In this data source, Year and Country both have their aggregation set to NONE, making them dimensions. Population's aggregation is set to SUM, making it a metric.

Your first attempt to create this metric might be to create the following calculated field:

SUM(CASE WHEN REGEXP_MATCH(Country, '^A.*') THEN Population ELSE 0 END)

Unfortunately, this won't work because the CASE statement does not allow you to mix metrics with dimensions in its WHEN, THEN, and ELSE clauses. (Remember, Population is currently a metric.)

To work around the "no mixing dimensions and metrics in CASE" limitation, change the aggregation type of the Population field to NONE. This makes Population a dimension.

To preserve the original Population field as a metric, make a copy of the field first. Then change the aggregation of the copy to NONE, and use that in your formula.

Interlude: How CASE really works

Before continuing, let's take a moment to dive deeper into how CASE works. There are 3 key points to understand:

1) CASE statements (as well as calculated field functions) are applied to your data post-aggregation. This means the data has already been retrieved by the data source before any calculated fields can take effect (otherwise, the calculation would have nothing to work with).

2) The CASE statement operates at the raw record-level of your data: WHEN conditions in your CASE statements evaluate each record coming from the data set. Evaluating every value in a dimension is easy, because dimensions are not aggregated. Every value is a distinct entity. For example, in your spreadsheet, the value of any given cell is absolute: e.g, A22 = Azerbaijan, B22 = 2013.

3) Evaluating aggregated metrics this way, however, is meaningless, because aggregations apply to a set of numbers, not individual values. If the metric is SUM(Population), it doesn't make sense to evaluate something like "SUM(Population) in cell C23."

Removing the aggregation from Population changes it from metric to dimension, allowing CASE to operate on each record's Population column.

This solution works well enough for some data sources, like Google Sheets. But it won't work for other data sources, such as Google Analytics or AdWords.

Let's examine why not.

Conditional calculated fields in AUTO aggregated data sources

Unlike Google Sheets, some Data Studio connectors don't support access to the raw data records. Google Analytics and AdWords are 2 such connectors. Metrics in these connectors are already computed by the underlying data set, not by Data Studio. These metrics have an aggregation type of AUTO, which cannot be changed in the data source.

For example, a data source using the Google Analytics connector contains fields like the following:

 

Field Name Field Type Aggregation
Date Date (YYYY) NONE
Country Text NONE
Sessions Number AUTO

 

As you can see, the Sessions metric has an aggregation type of AUTO. You can't change this to NONE and convert it to a dimension, as we did for the Population metric in our Google Sheet.

Learn more about aggregation.

The following data sources fall into the AUTO aggregated category:

  • Google Analytics
  • AdWords
  • Attribution 360
  • DoubleClick Campaign Manager (DCM)
  • Search Console
  • YouTube

In addition, calculated fields in any data source that include aggregation functions (e.g., SUM, COUNT, AVG, etc.,) in their formula field also have their aggregation type permanently set to AUTO.

Let's change our example and try to track the number of Sessions from Countries whose name starts with 'A'.

You might be tempted to do something like this:

Sessions * (CASE WHEN REGEXP_MATCH(Country, '^A.*') THEN 1 ELSE 0 END)

Unfortunately, this won't work either because you can't multiply a metric (Sessions) by a dimension (the output of the CASE statement).

But, if you modify the formula so it produces a metric, the formula will work:

Sessions * MAX(CASE WHEN REGEXP_MATCH(Country, '^A.*') THEN 1 ELSE 0 END)

Use conditional calculated field in reports

When creating a calculated field of any kind, you need to know how that field will be used in your report. That's because, with the exception of scorecards, charts don't just contain a single field. Instead, they plot your data in relationship to all the dimensions in the chart. The conditional calculated field we created above produces the expected results only if it is used in conjunction with the Country dimension. If the Country dimension is not part of the chart (for example, if you used it in a scorecard), this calculated field will just return the Sessions base metric. You should, therefore, plan on including any dimensions used in your WHEN conditions in charts that use the conditional calculated field.

Conclusion

Creating conditional calculated field is tricky. But, with planning and an understanding of the techniques outlined in this article, you can use conditional metrics to report on complex KPIs in your Data Studio reports.

Was this article helpful?
How can we improve it?
What's new in Data Studio

Learn about new features and recent changes.