Regular expressions in Data Studio

A regular expression ("regex") is a specific sequence of characters that broadly or narrowly match patterns in your data. You can use regular expressions to create more flexible filters in charts and controls. You can also use the following regular expression functions in calculated field formulas:

REGEXP_EXTRACT Returns the first matching substring in the input value which matches the regular expression pattern.
Learn more.
REGEXP_MATCH Returns true if the input value matches the regular expression pattern, otherwise returns false.
Learn more.
REGEXP_REPLACE Replaces all occurrences of text which matches the regular expression pattern in the input value with the replacement string.
Learn more.

Examples

These regular expressions illustrate how to use the functions above.

These examples have been kept simple on purpose, as complex regex can be difficult to understand.

Extracts the top-level directory in a URL:

REGEXP_EXTRACT(URL, '^https://[^/]+/([^/]+)/')

For example, if the URL field contained this page's address, the function above would return datastudio.

 

Categorize ad campaigns by language:

CASE
    WHEN REGEXP_MATCH(Campaign 2, R".*\|\s*en\s*\|.*") then "English"
    WHEN REGEXP_MATCH(Campaign 2, R".*\|\s*es\s*\|.*") then "Spanish"
    ELSE "Other language"
END

For example, applying this to the Campaign dimension in the Google Analytics Demo account gives these results:

Campaign Language
Campaign #1 Other language
1000549 | Google Analytics Demo | DR | apontes | NA | US | en | Hybrid | AW SEM | BKWS | ~ AW - Google Brand (US) English
1000549 | Google Analytics Demo | DR | apontes | NA | CA | es | Hybrid | AW SEM | BKWS | ~ AW - YouTube (CA) Spanish

 

Swap the order of sections in a string:

REGEXP_REPLACE(Campaign , R'(.*):(.*)', '\2 \1')

In the example above, the sections are separated by a colon (:).

Regex metacharacters

Metacharacters are characters that have special meaning in a regular expression. Following are some of the more common metacharacters you can use. Note that the examples open in the Google Analytics Help Center, but the information presented there applies equally to Data Studio.

See google/RE2 Github documentation for complete regular expression syntax.

Wildcards

. Matches any single character (letter, number or symbol) 1. matches
10, 1A

1.1 matches
111, 1A1

Examples
? Matches the preceding character 0 or 1 times 10? matches
1, 10

Examples
+ Matches the preceding character 1 or more times 10+ matches
10, 100

Examples
* Matches the preceding character 0 or more times 1* matches
1, 10

Examples
| Creates an OR match

Do not use at the end of an expression
1|10 matches
1, 10

Examples

Anchors

^ Matches the adjacent characters at the beginning of a string ^10 matches
10, 100, 10x

^10 does not match
110, 110x

Examples
$ Matches the adjacent characters at the end of a string 10$ matches
110, 1010

10$ does not match
100, 10x

Examples

Groups

( ) Matches the enclosed characters in exact order anywhere in a string


Also used to group other expressions
(10) matches
10, 101, 1011

([0-9]|[a-z]) matches
any number or lower-case letter

Examples
[ ] Matches the enclosed characters in any order anywhere in a string [10] matches
012, 120, 210

Examples
- Creates a range of characters within brackets to match anywhere in a string [0-9] matches any number 0 through 9

Examples

Escape

\

Indicates that the adjacent character should be interpreted literally rather than as a regex metacharacter.

See the note below on matching metacharacters.

\. indicates that the adjacent dot should be interpreted as a period or decimal rather than as a wildcard.

216\.239\.32\.34 matches
216.239.32.34

Examples

Character classes

\d

digits (≡ [0-9])

\D

not digits (≡ [^0-9])

\s

whitespace (≡ [\t\n\f\r ])

\S

not whitespace (≡ [^\t\n\f\r ])

\w

word characters (≡ [0-9A-Za-z_])

\W

not word characters (≡ [^0-9A-Za-z_])

Tips

Use simple expressions

Keep your regular expressions simple. Simple regex is easier for another user to interpret and modify.

Case sensitivity

Regular expressions are case-sensitive by default. You can make the match case-insensitive using the (?i) flag. For example:
  • REGEXP_EXTRACT(MyField, '(?i)(a.*)') extracts both "abc123" and "ABC123".

Escape the backslash

As noted above, you use the backslash (\) to escape regex metacharacters when you need those characters to be interpreted literally. To match a backslash in a quoted string, you'll need to escape that as well, resulting in 2 backslashes in your regex. For example:

REGEXP_REPLACE(String, "(\\[.*\\])\\+(\\[.*\\])","\\2 \\1")

As an alternative, consider using the Raw string literal prefix, R:

REGEXP_REPLACE(String, R"(\[.*\])\+(\[.*\])",R"\2 \1")

 

 
Was this helpful?
How can we improve it?