Guidelines for using regular expressions

You can set up Content compliance settings using regular expressions. Regular expressions are also useful for other advanced Gmail settings, such as routing settings.

A regular expression, also called a regex, is a method for matching text with patterns. For example, a regular expression can describe the pattern of email addresses, URLs, telephone numbers, employee identification numbers, social security numbers, or credit card numbers.

The use of regular expressions is a standard tool in many systems and scripting languages. Regular expressions can be simple or highly complex. This article provides information about how to use regular expressions when creating Content compliance policies. You can find detailed information, including tutorials and examples, on the following websites:

Uses for regular expressions

Using regular expressions, you can create content filters that can find the following:

Text patterns
Use this option to scan messages for patterns of letters, numbers, or a combination of both. For example, you can create regular expressions that match phone numbers, addresses, employee numbers, and account numbers. Or, you can create one regular expression that can find many different variations of a word, such as football, footb@ll, fo0tb@ll, and so on.

Note: By default, regular expressions are case-sensitive.

Complete words
Use this option to create more specific filters. For example, you can create a regular expression that matches the word foot, but not football. In this case, a regular expression can help to reduce the number of legitimate messages that the filter captures.

Text with variable characters
Use this option to scan messages for patterns that contain specific text along with text that varies. For example, you can create a single regular expression that matches a URL in the pattern www.[variable].com, such as www.abc1.com, www.abc2.com, and www.abc3.com.

Note: For content filters that are based on lists of individual words (for example, profanity, financial terms, or legal terms), we recommend that you use the objectionable content feature, which has built-in functionality for filtering lists of individual words.

Best practices for creating regular expressions

A content compliance filter may run slowly if you create an inefficient regular expression. For messages with just a single recipient, this may slow the message’s delivery by a few seconds. However, if the message has multiple recipients, the effect is magnified and can result in message deferrals (the message times out and isn't delivered to the intended recipients).

To avoid creating regular expressions that run slowly, we recommend the following:

  • Avoid using regular expressions for lists of individual words; instead, use objectionable content policies.
  • Make the regular expression as short and simple as possible by consolidating repeated elements. For example, to create a filter based on multiple phrases, change the following regular expression:

    (\W|^)phrase 1(\W|$)|(\W|^)phrase 2(\W|$)|(\W|^)phrase 3(\W|$)

    to this:
    (\W|^)(phrase 1|phrase 2|phrase 3)(\W|$)
Note: We only provide RE2 Syntax, which differs slightly from PCRE.
For detailed instructions and guidelines, see RE2 Syntax and Examples of Regular Expressions. See also Configure Content Compliance settings.
Was this helpful?
How can we improve it?