URL patterns

URL patterns are used to specify what pages you want included in your custom search engine. When you use the control panel or the Google Marker to add sites, you're generating URL patterns. Most URL patterns are very simple and simply specify a whole site. However, by using more advanced patterns, you can more precisely pick out portions of sites.

For example, the pattern 'www.foo.com/bar' will only match the single page 'www.foo.com/bar'. To cover all the pages where the URL starts with ' www.foo.com/bar', you must explicitly add a '*' at the end. In the form-based interfaces for adding sites, 'foo.com' defaults to '*.foo.com/*'. If this is not what you want, you can change it back in the control panel. No such defaulting occurs for patterns that you upload. Also note that URLs are case sensitive - if your site URLs include capital letters, you'll need to make sure your patterns do as well.

In addition, the use of wildcards in URL patterns allows you to include or exclude multiple pages or portions of a site all at once. The following patterns illustrate how you can use wildcards:

  • The wildcard pattern 'www.webmd.com/hw/cancer/*bar' specifies all the URLs that begin with ' www.webmd.com/hw/cancer/' and contain 'bar'.
  • The prefix pattern 'www.webmd.com/*' specifies all the URLs that begin with ' www.webmd.com', i.e. all the URLs on the www.webmd.com site.
  • The exact-match pattern ' www.webmd.com/' specifies only the URLs 'http://www.webmd.com/' and 'https://www.webmd.com/'.

More detailed examples are included in this table:

Pattern Description Matches Does not match
www.example.com/ Matches a single page www.example.com/
example.com/
host.example.com
www.example.com/stamps
www.example.com/* Matches all URLs beginning with www.example.com or example.com www.example.com
www.example.com/stamps
example.com/stamps
host.example.com/
host.example.com/stamps
www.example.com/*kites Matches all URLs that begin with www.example.com/ or example.com/ and contain the word "kites" www.example.com/kites.html
www.example.com/kites/page2.html
www.example.com/funwithkites.html
www.example.com
www.example.com/stamps
www.example.com/product.asp*cat=Elec Matches all URLs that begin with www.example.com/product.asp and contain the term 'cat=Elec' www.example.com/product.asp?sku=20283&cat=Elec www.example.com
www.example.com/stamps
www.example.com/*kites*fly Matches all URLs that begin with www.example.com/and contain the words "kites" and "fly" ww.example.com/kites/howto/fly.html
www.example.com/fly/howto/kites.html
www.example.com/kites/help.html
www.example.com/help/fly.html
*.example.com/* Matches all sub-domains under example.com www.example.com/stamps
host.parent.example.com/kites
example.com/kites/fly.html
example.host.com

Adding "top-level domains" such as '*.com', '*.travel/*' is not permitted in Google site search. Adding "top-level domain" returns a "forbidden" error in Google site search - search results page