Guidelines for using regular expressions
A regular expression, also called a regex, is a method for matching text with patterns. For example, a regular expression can describe the pattern of email addresses, URLs, telephone numbers, employee identification numbers, social security numbers, or credit card numbers.
The use of regular expressions is a standard tool in many systems and scripting languages. Regular expressions can be simple or highly complex. This article provides information about how to use regular expressions when creating Content compliance policies. You can find detailed information, including tutorials and examples, on the following websites:
Uses for regular expressions
Using regular expressions, you can create content filters that can find the following:
Use this option to scan messages for patterns of letters, numbers, or a combination of both. For example, you can create regular expressions that match phone numbers, addresses, employee numbers, and account numbers. Or, you can create one regular expression that can find many different variations of a word, such as football, footb@ll, fo0tb@ll, and so on.
Use this option to create more specific filters. For example, you can create a regular expression that matches the word foot, but not football. In this case, a regular expression can help to reduce the number of legitimate messages that the filter captures.
Text with variable characters
Use this option to scan messages for patterns that contain specific text along with text that varies. For example, you can create a single regular expression that matches a URL in the pattern www.[variable].com, such as www.abc1.com, www.abc2.com, and www.abc3.com.
Best practices for creating regular expressions
A content compliance filter may run slowly if you create an inefficient regular expression. For messages with just a single recipient, this may slow the message’s delivery by a few seconds. However, if the message has multiple recipients, the effect is magnified and can result in message deferrals (the message times out and isn't delivered to the intended recipients).
To avoid creating regular expressions that run slowly, we recommend the following:
- Avoid using regular expressions for lists of individual words; instead, use objectionable content policies.
- Make the regular expression as short and simple as possible by consolidating repeated elements. For example, to create a filter based on multiple phrases, change the following regular expression:
(\W|^)phrase 1(\W|$)|(\W|^)phrase 2(\W|$)|(\W|^)phrase 3(\W|$)
(\W|^)(phrase 1|phrase 2|phrase 3)(\W|$)