Search
Clear search
Close search
Google apps
Main menu

Search Protocol Reference

Request Format

The information in this section helps you create custom searches for your web site. By using search parameters, special query terms and filters in your search requests, you can refine and enhance searches to serve your needs.

Back to top

Request Overview


Using the Google search protocol is as simple as requesting a page from a web server. The Google search request is a standard HTTP GET or POST command, which returns results in either XML or HTML format, as specified in the search request.

The search request is a URL that combines the following:

  • Your Google Search Appliance host name or IP address, which were assigned when the search appliance was set up
  • Search interface port (default HTTP serving port: 80 for HTTP and 443 for HTTP over SSL/TLS)
  • A path describing the search query. The path starts with “/search?”, and is followed by one or more name-value pairs (input parameters) separated by the ampersand (&) character.

The GET command has a 2KB limit on query strings. To submit longer query strings, use the POST command, as described in Using the POST Command.

Using the POST Command

In some instances, your query strings might exceed the 2KB URL length limit of GET requests and be truncated. This might happen when you submit dynamic navigation queries containing a large number of metadata filters. You can avoid this limitation by submitting POST requests instead, which have a much larger body limit (10KB).

POST Limitations

POST support is only available for:

POST support is not available for other Universal Login Auth Mechanisms. You must use the GET command for these.

If you are sending non UTF-8 data, you must include the ie parameter in the POST body. This parameter sets the character encoding that is used to interpret the query string. You should also specify the access parameter (as shown in Search Request Examples (POST command)) in the POST body when sending POST requests.

The following search parameters are not included by default in a POST request:

  • entqr--Sets the query expansion policy.
  • entqrm--Controls query expansions for meta tags.
  • entsp--Controls the use of the advanced relevance scoring parameters.
  • filter-Activates or deactivates automatic results filtering.
  • ip--Indicates the IP address of the user who submitted the search query.
  • tlen--Specifies the number of bytes that would be used to return the search results title.
  • ulang--Indicates the language of the user who submitted the search query
  • wc--Specifies the number of wildcard expansions for a wildcard expression.
  • wc_mc--Specifies whether or not the search appliance considers all words with * as wildcard terms.
  • wc_p--Specifies whether the search appliance includes or excludes metadata from wildcard expansions.

If you want to include any of these parameters in a POST request, you must add them. For more information about these parameters, see Search Parameters

Structure of the POST Body

The structure of the POST body is a URL-encoded query string. It is like the URL of a GET request, after the question mark.

Submitting a Search Request

Typically, search users make search requests by entering search parameters in a HTML form rendered in a web browser (like the following):


<form method="GET" action="http://search.mycompany.com/search">
   <input type="text" name="q" size="32" maxlength="256" value="query string">
   <input type="submit" name="btnG" value="Google Search">
   <input type="hidden" name="site" value="default_collection">
   <input type="hidden" name="client" value="default_frontend">
   <input type="hidden" name="output" value="xml_no_dtd">
   <input type="hidden" name="proxystylesheet" value="default_frontend">
</form>

Such forms are the most recognizable methods for generating GET requests, but there are numerous other ways. For example, a web page may include a direct link that brings users to a page of search results:


http://search.mycompany.com/search?q=query+string
                           &site=default_collection
                           &client=default_frontend
                           &output=xml_no_dtd
                           &proxystylesheet=default_frontend HTTP/1.0

Alternatively, a web application may make a HTTP GET request directly:


GET /search?q=query+string&site=default_collection
                           &client=default_frontend
                           &output=xml_no_dtd
                           &proxystylesheet=default_frontend HTTP/1.0

Each of these examples results in the same GET request. The HTTP response to this request contains the first page of search results for the query “query string”, restricted to URLs in the collection named “default_collection.” The results are rendered into HTML format using the XSL stylesheet associated with the front end named “default_frontend”.

You can search multiple collections by separating collection names with the OR character ( | ) or the AND character (.), for example: &site=col1.col2 or &site=col1|col2.

The rest of the examples that follow use the raw HTTP GET format (as in the last example).

Search Request Examples (GET Command)

Example 1. This request returns the first 10 results that match the search query terms bill and “material”:

GET /search?q=bill+material&output=xml&client=test&site=operations HTTP/1.0

Explanation:

The search query is “bill material”.

GET /search?q=bill+material&output=xml&client=test&site=operations HTTP/1.0

Search is limited to the documents in the “operations” collection.

GET /search?q=bill+material&output=xml&client=test&site=operations HTTP/1.0

Results are returned in the Google XML output format.

GET /search?q=bill+material&output=xml&client=test&site=operations HTTP/1.0

Example 2. This request returns results numbered 11-15 that match the same query terms and collection as example 1. As specified by the proxystylesheet parameter, the results are rendered in the custom HTML output format defined by the front end named “test.”

GET /search?q=bill+material&start=10&num=5&output=xml_no_dtd&proxystylesheet=test&client=test&site=operations HTTP/1.0

Explanation:

This search request uses the same search query terms and collection as in Example 1.

GET /search?q=bill+material&start=10&num=5&output=xml_no_dtd&proxystylesheet=test&client=test&site=operations HTTP/1.0

Results numbered 11-15 are returned.

GET /search?q=bill+material&start=10&num=5&output=xml_no_dtd&proxystylesheet=test&client=test&site=operations HTTP/1.0

Results are returned in custom HTML output format, which is created by applying the XSL stylesheet associated with the “test” front end to the standard XML results. See proxystylesheet.

GET /search?q=bill+material&start=10&num=5&output=xml_no_dtd&proxystylesheet=test&client=test&site=operations HTTP/1.0

Example 3. This request returns the first 10 German results that match the search query Star Wars Episode +I”:

GET /search?q=Star+Wars+Episode+%2BI&output=xml_no_dtd&lr=lang_de&ie=latin1&oe=latin1&client=test&site=movies&proxystylesheet=test HTTP/1.0

Explanation:

The search query term is “Star Wars Episode +I”. Search is limited to documents in the “movies” collection.

GET /search?q=Star+Wars+Episode+%2BI&output=xml_no_dtd&lr=lang_de&ie=latin1&oe=latin1&client=test&site=movies&proxystylesheet=test HTTP/1.0

Results show the first 10 German results.

GET /search?q=Star+Wars+Episode+%2BI&output=xml_no_dtd&lr=lang_de&ie=latin1&oe=latin1&client=test&site=movies&proxystylesheet=test HTTP/1.0

Results are returned in Google custom HTML output format, which is created by applying the XSL stylesheet associated with the “test” front end to the standard XML results.

GET /search?q=Star+Wars+Episode+%2BI&output=xml_no_dtd&lr=lang_de&ie=latin1&oe=latin1&client=test&site=movies&proxystylesheet=test HTTP/1.0

Search Request Examples (POST command)

The following examples show search requests that use the POST command for public search only. The POST command should have a target to search:

POST /search HTTP/1.0

The query string payload is not part of the header; it appears in the body of the request. The following examples show query strings. Take note that line breaks are used for readability only and should not be present in actual code.

This request returns the first 10 results that match the search query terms bill and “material”:

q=bill+material&output=xml&client=test&site=operations&access=p

This request returns results numbered 11-15 that match the same query terms and collection as example 1. As specified by the proxystylesheet parameter, the results are rendered in the custom HTML output format defined by the front end named “test.”

q=bill+material&start=10&num=5&output=xml_no_dtd&proxystylesheet=test&client=test&site=operations&access=p

Back to top

Search Parameters


This section lists the valid name-value pairs that can be used in a search request and describes how these parameters modify the search results.

All search requests must include the parameters site, client, q, and output. All parameter values must be URL-encoded (see URL Encoding), except where otherwise noted.

access

Specifies whether to search public content, secure content, or both.

Possible values for the access parameter are:

Value

Description

p

search only public content

s

search only secure content

a

search all content, both public and secure

Default value: p

as_dt

Modifies the as_sitesearch parameter as follows:

Value

Modification

i

Include only results in the web directory specified by as_sitesearch

e

Exclude all results in the web directory specified by as_sitesearch

For example, to exclude results, use as_dt=e.

Default value: i

as_epq

Adds the specified phrase to the search query in parameter q.

For example, to add the terms “hello there” use as_epq=hello there

This parameter has the same effect as using the phrase special query term (see Phrase Search).

Default value: Empty string

as_eq

Excludes the specified terms from the search results.

For example, to filter out results that contain the term “deprecated,” use as_eq=deprecated

This parameter has the same effect as using the exclusion (-) special query term (see Exclusion).

Default value: Empty string

as_filetype

Specifies a file format to include or exclude in the search results. Modified by the as_ft parameter. For a list of possible values, see File Type Filtering.

For example, to include only pdf files in results, use as_filetype=pdf

Default value: Empty string

as_ft

Modifies the as_filetype parameter to specify filetype inclusion and exclusion options. The values for as_ft are:

Value

Description

i

Adds the special query term filetype: to the query followed by the value of as_filetype.

e

Adds the special query term -filetype: to the query followed by the value of as_filetype.

For example, to add the special query term filetype:, use as_ft=i

Query is the string that is included in the response’s q element. Both as_filetype and as_ft are also returned in the response’s PARAM elements.

Default value: Empty string

as_lq

Specifies a URL, and causes search results to show pages that link to the that URL. This parameter has the same effect as the link special query term (see Back Links). No other query terms can be used when using this parameter.

For example, to return results that have links to http://myUrl.com/Page, use
as_lq=http://myUrl.com/Page

Default value: Empty string

as_occt

Specifies where the search engine is to look for the query terms on the page: anywhere on the page, in the title, or in the URL.

Value

Meaning

any

anywhere on the page

title

in the title of the page

url

in the URL for the page

For example to specify that the search engine should only look in titles, use as_occt=title

Default value: any

as_oq

Combines the specified terms to the search query in parameter q, with an OR operation.

For example to search for documents that contain the terms“London” or “Paris,” use:
as_oq=London Paris, as_oq=London%20Paris, or as_oq=London+Paris

This parameter has the same effect as the OR special query term and is used only for single words (see Boolean OR Search).

Default value: Empty string

as_q

Adds the specified query terms to the query terms in parameter q.

For example, to add the terms “enterprise” and “large” use as_q=enterprise large

Default value: Empty string

as_sitesearch

Limits search results to documents in the specified domain, host or web directory, or excludes results from the specified location, depending on the value of as_dt. This parameter has the same effect as the site or -site special query terms. It has no effect if the q parameter is empty.

When the Google Search Appliance receives a search request that includes the as_sitesearch parameter, it converts the value of the parameter into an argument to the site special query term and appends it to the value of q in the search results. For example, suppose that a search contains these parameters:

q=mycompany&as_sitesearch=www.mycompany.com

The raw XML of the search results contains the following:

<q>mycompany site:www.mycompany.com</q>

The default XSLT stylesheet displays the value of the q tag in the search box on the results page. Consequently, using an as_sitesearch parameter changes the user’s search query by modifying the contents of the search box.

The specified value for as_sitesearch must contain fewer than 125 characters. See also the site parameter (see site).

Default value: Empty string

client

Required parameter. If this parameter does not have a valid value, other parameters in the query string do not work as expected.

A string that indicates a valid front end and the policies defined for it, including KeyMatches, related queries, filters, remove URLs, and OneBox Modules. Notice that the rendering of the front end is determined by the proxystylesheet parameter. Example: client=myfrontend

dnavs

Used when the dynamic navigation feature is enabled and applied to a front end.

This parameter stores the current dynamic navigation filters applied in the search results. It does not affect the search results in any way and is used only in the XSLT rendering logic. Dynamic navigation uses the q parameter for affecting search results by appending the selected filters as inmeta: query terms.

entqr

This parameter sets the query expansion policy according to the following valid values:

Value

Description

0

None

1

Standard (entqr=1 )--Uses only the search appliance’s synonym file.

2

Local (entqr=2 )--Uses all displayed and activated synonym files.

3

Full (entqr=3 )--Uses both standard and local synonym files.

Standard terms use only the search appliance’s internal contextual (synonym) files for query expansion. Local terms use all displayed and activated synonym files, including any uploaded files. After you configure and enable the appropriate query expansion files, set the query expansion policy for a front end. Each front end has a policy that specifies whether it uses the search appliance’s built-in logic (the “standard” set of terms), your own list of synonyms (the “local” set), or both (the “full” set). Query expansion files are used only if the query expansion policy for a front end is set to Local or Full.

If this parameter is omitted, the query expansion value specified for the front end is used.

Default value: 0

entqrm

The entqrm parameter controls query expansions for meta tags according to the following valid values::

Value

Description

0

None

1

Names (entqrm=1 ) Enables query expansion only for meta-tag names.

2

Values (entqrm=2 ) Enables query expansion only for meta-tag values.

3

Both (entqrm=3) Enables query expansion for both meta-tag names and values.

Default value: 0

entsp

The entsp parameter controls the use of the advanced relevance scoring parameters that you set under Result Biasing on the Admin Console. The parameter accepts the following valid values:

Value

Description

No value

If you do not specify a value for the entsp parameter in the search request, the scoring policy specified for the current front end is used. For example, if the search appliance uses a front end called my_frontend in which the scoring policy my_scorepolicy is configured, omitting the entsp parameter means that the scoring policy my_scorepolicy is used.

0

Do not use any scoring policy.

a

Specifies that the default scoring policy for the search appliance is used. It should be named as default_policy.

a__xxx

Specifies a particular advanced scoring policy. For example, for a source biasing policy called mypolicy, the parameter is set with the following syntax:


entsp=a__mypolicy

Note that the above syntax uses two underscores between the a and the name of the source biasing policy.

Default value: 0

filter

Activates or deactivates automatic results filtering. By default, filtering is applied to Google search results to improve results quality. See Automatic Filtering for more information.

Default value: 1

getfields

Indicates that the names and values of the specified meta tags should be returned with each search result, when available. See Meta Tags for more information.

Meta tag names or values must be double URL-encoded (see URL Encoding).

Default value: Empty string

gsaRequestID

A GSA-generated ID that is set at the start of a query session and that exists only for the length of a query. Serving logs use this value, which is sent back to the search appliance for each subsequent request during the query session.

Default value: None.

ie

Sets the character encoding that is used to interpret the query string. See Internationalization for more information.

Default value: latin1

ip

When queries are made using the HTTP or HTTPS protocol, the ip parameter contains the IP address of the user who submitted the search query. You do not supply this parameter with the search request. The ip parameter is returned in the XML search results. For example:

<PARAM name="ip" value="172.24.96.29" original_value="172.24.96.29"/>

Default value: Value is not set in the search request; the value is automatically returned in the search results.

lr

Restricts searches to pages in the specified language. If there are no results in the specified language, the search appliance displays results in all languages. The search appliance may use the language parameter to segment search queries in some Asian languages that do not normally have spaces between words. As a result, you might see different results to your search depending on the value of the lr parameter. See Language Filters for more information.

Default value: Empty string

num

Maximum number of results to include in the search results. The maximum value of this parameter is 1000. Taken together, the values of the start (see start) and num parameters determine the range of the results that are returned.

The initial index point of the search results is the value of the start parameter (see start). The ending index point of the search results is the value of the start parameter (see start) plus the value of the num parameter minus 1. All index points are zero based, meaning the first result has the value 0.

The actual number of results may be smaller than the requested value.

Default value: 10

numgm

Number of KeyMatch results to return with the results. A value between 0 to 50 can be specified for this option.

Default value: 3

oe

Sets the character encoding that is used to encode the results. See Internationalization for more information.

Default value: ISO-8859-1

output

Required parameter. If this parameter does not have a valid value, other parameters in the query string do not work as expected.

Selects the format of the search results. Example: output=xml

Value

Output Format

xml_no_dtd

XML results or custom HTML

(See proxystylesheet parameter for details.)

xml

XML results with Google DTD reference. When you use this value, omit proxystylesheet.

partialfields

Restricts the search results to documents with meta tags whose values contain the specified words or phrases.

(See Meta Tags for more information.)

Meta tag names or values must be double URL-encoded (see URL Encoding).

Default value: Empty string

proxycustom

Specifies custom XML tags to be included in the XML results. The default XSLT stylesheet uses these values for this parameter: <HOME/>, <ADVANCED/>. The proxycustom parameter can be used in custom XSLT applications. See Custom HTML for more information.

This parameter is disabled if the search request does not contain the proxystylesheet tag. If custom XML is specified, search results are not returned with the search request.

Meta tag names or values must be double URL-encoded (see URL Encoding).

Default value: Empty string

proxyreload

Instructs the Google Search Appliance when to refresh the XSL stylesheet cache. A value of 1 indicates that the Google Search Appliance should update the XSL stylesheet cache to refresh the stylesheet currently being requested. This parameter is optional. By default, the XSL stylesheet cache is updated approximately every 15 minutes. (See Custom HTML for more information.) Take note that updating the XSL stylesheet cache increases latency for the search request and should not be used in production environment with high load or during performance testing.

Default value: 0

proxystylesheet

If the value of the output parameter is xml_no_dtd, the output format is modified by the proxystylesheet value as follows:

Proxystylesheet Value

Output Format

Omitted

Results are in XML format.

Front End Name

Results are in Custom HTML format. The XSL stylesheet associated with the specified Front End is used to transform the output.

Custom HTML for more details. Notice that a valid front end and the policies defined for it are determined by the client parameter. If the proxystylesheet value is an empty string ("" ), an error is returned.

Default value: N/A

q

Search query as entered by the user.

Query Terms for additional query features.

Default value: N/A

rc

Request an accurate result count for up to 1M documents. When rc = 1, the user will get accurate result count. This might introduce high latency. rc=0 works like current default search estimates, as described in Estimated vs. Actual Number of Results.

Default value: 0

requiredfields

Restricts the search results to documents that contain the exact meta tag names or name-value pairs. See Meta Tags for more information.

Meta tag names or values must be double URL-encoded (see URL Encoding).

Default value: Empty string

secure_estimates

Retrieves estimates for secure searches if Show Per-Query Estimates is enabled on the Search > Search Features > Query Settings page in the Admin Console and the secure_estimates search parameter is set to 1 in the request:

&secure_estimates=1

Default value: 0

site

Required parameter. Limits search results to the contents of the specified collection.

If this parameter does not have a valid value, other parameters in the query string do not work as expected. Omitting this parameter from a search query causes the entire search index to be queried instead of limiting search results.

If this parameter contains characters that are not allowed, the search appliance does not return any results for the query. This parameter allows . _ - and |.

You can search multiple collections by separating collection names with the OR character, which is notated as the pipe symbol, or the AND character, which is notated as a period.

The following example uses the AND character:

&site=col1.col2

The following example uses the OR character:

&site=col1|col2

Query terms info, link and cache ignore collection restrictions that are specified by the site query parameter.

The site parameter is required for Advanced Search Reporting.

sitesearch

Limits search results to documents in the specified domain, host, or web directory. Has no effect if the q parameter is empty. This parameter has the same effect as the site special query term.

Unlike the as_sitesearch parameter, the sitesearch parameter is not affected by the as_dt parameter. The sitesearch and as_sitesearch parameters are handled differently in the XML results. The sitesearch parameter’s value is not appended to the search query in the results. The original query term is not modified when you use the sitesearch parameter. The specified value for this parameter must contain fewer than 125 characters.

Default value: Empty string

sort

Specifies a sorting method. Results can be sorted by date. (See Sorting for sort parameter format and details.)

Default value: Empty string

start

Specifies the index number of the first entry in the result set that is to be returned. Use this parameter and the num parameter (see num) to implement page navigation for search results. The index number of the results is 0-based. For example:

  • start =0, num =10, returns the first 10 results. These are returned by default if you do not specify values for start or num.
  • start =10, num =10, returns the next 10 results.

The maximum number of results available for a query is 1,000, i.e., the value of the start parameter added to the value of the num parameter cannot exceed 1,000.

Default value: 0

tlen

Specifies the number of bytes that would be used to return the search results title. If titles contain characters that need more bytes per character, for example in utf-8, this parameter can be used to specify a higher number of bytes to get more characters for titles in the search results.

Default value: 70 bytes

ud

Specifies whether results include ud tags. A ud tag contains internationalized domain name (IDN) encoding for a result URL. IDN encoding is a mechanism for including non-ASCII characters. When a ud tag is present, the search appliance uses its value to display the result URL, including non-ASCII characters.

The value of the ud parameter can be zero (0) or one (1):

  • A value of 0 excludes ud tags from the results.
  • A value of 1 includes ud tags in the results.

As an example, if the result URLs contain files whose names are in Chinese characters and the ud parameter is set to 1, the Chinese characters appear. If the ud parameter is set to 0, the Chinese characters are escaped.

Default value:

  • When a search request includes the proxystylesheet parameter, the default value for ud is 1 and cannot be modified.
  • When the search request does not include the proxystylesheet parameter, the default value for ud is 0 and the value can be modified.

ulang

Gets the user's browser language. The user can specify this search parameter. If it is not specified, it takes the value from HTTP headers in the received search request. XSLT uses this parameter to translate titles and snippets into the user's browser language.

A similar parameter, inlang, is for GSA internal use only.

wc

Specifies the number of wildcard expansions for the wildcard expression. Takes values in the range of 0-1000, where 0 disables wildcard search.

For example, the wildcard term go* expands into any word that begins with the pattern "go." If wc=3, then the search expands to include at most 3 expanded terms.

Default value: 200

wc_mc

Specifies whether or not the search appliance considers all words with * as wildcard terms. Valid values are:

  • 1--Consider all words with * as wildcard terms.
  • 0--To use a wildcard term, the user must type the full wildcard expression: wildcard:pattern*.

Default value: 1

For more information, see Wildcard Search

wc_p

Specifies whether the search appliance includes or excludes metadata from wildcard expansions. This parameter applies to wildcard expansions only. It does not apply when matching terms, for example, when using operators such as intext: or inmeta:. Supported in release 7.6.50 and later.

Values are:

  • 1--No restriction. Content and metadata expansions are returned in results.
  • 2--Only metadata expansions are returned in results.
  • 3--Metadata expansions are excluded from returned results.

Default value: 1

Custom Parameters

In addition to the Search Parameters, you can also define custom parameters in a search request. The search appliance returns custom parameters and their values in the search results.

For security reasons, all space characters in a custom parameter are replaced by an underscore (_). For example:


http://search.customer.com/search?q=customer+query
 &site=collection
 &client=collection
 &output=xml_no_dtd
 &myparam=test+this

This search request includes the custom parameter myparam with a value of test+this . The space character (represented as "+") in the custom parameter myparam is replaced by the underscore character (_) in the XML output.

The resulting XML output looks like this:

<param name="q" value="customer query" original_value="customer+query"/>
<param name="myparam" value="test_this" original_value="test+this" />

The unmodified value can be retrieved from the original_value attribute.

Back to top

Query Terms


By default, the Google Search Appliance returns only pages that include all of your search terms. You do not need to include “AND” between terms. The order of search terms affects the search results. To further restrict a search, just include more terms. To use keywords such as AND as regular search terms instead of as special keywords, enclose them in quotes.

The search appliance may ignore common words and characters such as where and how and other digits and letters that slow down a search without improving the results.

If a common word is essential to getting the results you want, you can include the word by putting double quotes around it. For example, to ensure that Google includes the “I” in a search for “Star Wars Episode I”, enter the search query as follows:

Star Wars Episode “I”

Special Characters: Query Term Separators

By default, non-alphanumeric characters in a search query separate the query terms in the same way as space characters. For example, the following search term is not one query term, but six query terms:

3,6-DICHLORO-2-PYRIDINECARBOXYLIC ACID

The terms are:

3
6
DICHLORO
2
PYRIDINECARBOXLYIC
ACID

The following characters are exceptions:

Character

Description

Double quote mark (" )

Used as a special query term for phrase searches. Note that using double quotation marks for phrase search does not reduce the number of query terms. For example, the search term 3,6-DICHLORO-2-PYRIDINECARBOXYLIC ACID is six query terms whether or not it is enclosed in quotation marks.

Forward slash (/)

Used as a special query term for phrase searches.

Plus sign (+ )

Treated as a Boolean AND.

Minus sign or hyphen (- )

Treated as part of a query term if there is no space preceding it. A hyphen that is preceded by a space is the Exclude Query Term operator. A hyphen after a parenthesis is treated as the Exclude Query Term operator. For example, the query Fmoc-Cys(Trt)-OH returns documents that contain Fmoc-Cys(Trt) and excludes documents that contain OH in addition to Fmoc-Cys(Trt).

Decimal point (. )

Treated as a query term separator unless it is part of a number (for example, 250.01 ). For example, dancing.parrot is equivalent to "dancing parrot" with quotes in the query. The term dancing.parrot is not equivalent to dancing parrot (without quotes).

Ampersand (& )

Treated as another character in the query term in which it is included.

If a document contains a number, with or without a decimal point, that has letters immediately before or after it, the letters are treated as a separate word or words. For example, the string 802.11a is indexed as two separate words, 802.11 and a.

An underscore (or under bar) is not a query term separator. For example, if you search for taino_the_parrot , the only valid search result is a document that contains the exact phrase, taino_the_parrot. A search for taino or parrot does not return the taino_the_parrot result.

Special Query Terms

Google search supports the following special query terms. A user or search administrator can use these terms to access additional search features.

All query terms must be correctly URL-encoded in a search request (see URL Encoding).

Anchor text search

Restricts the search to pages that contain all the search terms that are specified in the anchor text in links to the page. For example, allinanchor:best museums sydney returns only pages in which the anchor text in links to the pages contain the words “best,” “museums,” and “sydney.” The following example shows an anchor tag:

<a href="http://foo.com"> museums </a>

allinanchor: evaluates the text between > and </a>. allinanchor: evaluates only <a href anchor tags. It does not evaluate <a name anchor tags.

An anchor is a marker inserted at a specific section of a page. It lets the writer of the document create links to these anchors, which, when clicked, quickly take the reader to the specified section of the same page or another page. The table of contents at the top of this document, for example, uses hyperlinks to anchors embedded throughout this document.

Do not include any other search operators with the allinanchor: operator.

Sample usage:

allinanchor:membership directory

Back Links

The query prefix link: lists web pages that have links to the specified web page. No spaces can come between link and the web page URL.

The URL pattern for the linked-to web page must appear in Follow and Crawl URL patterns on the Content Sources > Web Crawl > Start and Block URLs page in the Admin Console. Otherwise, the link query does not produce any search results. For example, consider the following the query link:http//www.example.com/child.html. For this query to return any results, www.example.com/ must appear in Follow and Crawl URL patterns.

No other query terms can be specified when using this special query term. Query terms info, link and cache ignore collection restrictions that are specified by the site parameter. The search request parameter as_lq (see as_lq) can also be used to submit a link request.

The query term link: returns 25 results as default but you can configure this number by using the Search > Search Features > Query Settings page in the Admin Console.

Sample usage:

link:www.google.com

Boolean OR Search

Google search supports the Boolean OR operator. To retrieve pages that include either word A or word B, use an uppercase OR between terms. The search request parameter as_oq (see as_oq) can also be used to submit a search for any term in a set of terms.

For additional information on the use of OR, see “Usage Notes” in Using inmeta to Filter by Meta Tags.

Sample usage:

vacation london OR paris

The OR operator takes precedence over the AND operator, so this example will be treated as:

vacation AND (london OR paris)

Cached Results Page

The query prefix cache: returns the cached HTML version of the specified web document that the Google search crawled. Note there can be no space between cache: and the web page URL. Words that appear in the query are highlighted in the cached document.

To use Google’s default cached result display, omit the output parameter in the cache request. To customize the display of cached results, request XML or Custom HTML output as part of the cache request and ensure that your parser or stylesheet handles the incoming cache data. Query terms info, link and cache ignore collection restrictions that are specified by the site parameter. See also the site parameter (see site).

Sample usage:

cache:www.google.com web

Date Range Search

Restrict search to documents with modification dates that fall within a time frame. You can search any dates between 1900-01-01 and 2079-06-06. For a complete list of date formats, see Acceptable Date Formats.

Date range searches by themselves do not return results and must be accompanied by a search term.

Only documents that have a modification date are returned for a daterange query. Documents that do not have modification dates are excluded from the results.

To specify dates in ISO 8601 format (such as YYYY-MM-DD), use two dots (.. ) to separate dates in the date range. For example, to search for documents that contain the word parrot and were modified between August 1, 2008 and December 24, 2008, enter the following statement:

parrot daterange:2008-08-01..2008-12-24

You can specify that a search be for all modification dates before a date by preceding the date with the two dots. For example, to search for all documents containing parrot that were modified before August 8, 2008, specify the date range with the following statement:

parrot daterange:..2008-08-08

You can specify that a search be for all documents that were modified after a specific date by specifying a date followed by two dots. For example, to search for all documents that were modified after January 1, 2009 that contain parrot, specify the date range with the following statement:

parrot daterange:2009-01-01..

To specify how a search appliance sorts search results by document dates, use Index > Document Dates in the Admin Console. You can sort search results by the dates found in a document’s URL, a meta tag, the title, the body, or when the document was last modified. If you choose to sort by a meta tag, the meta tag that you specify can contain only a date.

Dates in Julian format can be treated as a date range only with the daterange keyword. Without the daterange keyword, Julian dates are considered a number range search. (A Julian date is an integer number of days that have elapsed since noon on January 1, 4713 BC. For example, August 1, 2008 at noon has a Julian date of 2454680.)

For further options for searching dates in meta tags, see Using inmeta to Filter by Meta Tags.

Sample usage:

election daterange:2008-01-20..2009-01-20
election daterange:2008-01-20..
election daterange:..2009-01-20
parrot daterange:2452122-2452234

Directory Restricted Search

Restrict search to documents within a domain or directory. Enter the query followed by site: followed by the host name and path of the web directory. To limit the search to a domain, specify a string that matches a complete name-segment of the canonical host name.

To search a particular directory on a web server (including the root directory), specify a string that is the complete canonical name of the host server followed by the path of the directory. If the forward slash character (/) is at the end of the web directory path specified, then search is limited to the files within that directory. Files in sub-directories are not considered.

The URLs used with site must contain fewer than 119 characters. The exclusion operator (-) can be applied to this to remove a web directory from consideration in the search. You can specify one site term per search request or multiple site terms using the Boolean OR operator.

The search request parameters as_sitesearch (see as_sitesearch) and as_dt (see as_dt) can also be used to submit directory restricted searches. See also the site parameter (see site).

Sample usage:

  • Domain search examples:
site:www.google.com
site:google.com
site:com
  • Directory search examples:
admission site:www.stanford.edu/group/uga
site:www.google.com/enterprise/
site:www.google.com/about
gxp site:www.corp.google.com/eng/howto OR site:www.corp.google.com/eng/doc

Exclusion

Sometimes what you’re searching for has more than one meaning. For example, the term “bass” can refer to either fishing or music. You can exclude a word from your search by putting a minus sign (-) immediately in front of the term you want to exclude from the search results. Be sure to include a space before the minus character.

The search request parameter as_eq (see as_eq) can also be used to submit terms to exclude.

Sample usage:

bass -music

File Extension Filtering

The query prefix ext: filters the results to include only documents with the specified file extension. No spaces can come between ext: and the type. For example, ext:pdf, which retrieves all documents with the pdf extension.

You can combine this prefix with the filetype prefix to construct the following types of query : filetype:pdf AND ext:pdf, which retrieves all documents with the Mime type pdf and with the pdf extension.

You can exclude file types by putting a minus sign before ext, such as -ext:pdf. For more information, see File Extension Exclusion.

Sample usage:

whitepaper ext:doc OR ext:pdf

File Extension Exclusion

The query prefix-ext: filters the results to exclude documents with the specified file extension. No spaces can come between -ext: and the specified extension.

You can exclude multiple file types by adding more -ext: terms to the search query.

Sample usage:

whitepaper -ext:pdf -ext:doc

File Type Filtering

The query prefix filetype: filters the results to include only documents with the specified MIME content type. No spaces can come between filetype: and the type.

You can exclude file types by putting a minus sign before filetype, such as -filetype:pdf. For more information, see File Type Exclusion.

also See as_filetype and See as_ft for including and excluding documents from the search results.

You can specify multiple file types by adding filetype: terms to the search query, combined with the Boolean OR.

Sample usage:

whitepaper filetype:doc OR filetype:pdf

File Type Exclusion

The query prefix-filetype: filters the results to exclude documents with the specified file extension. No spaces can come between -filetype: and the specified extension.

You can exclude multiple file types by adding more -filetype terms to the search query.

Sample usage:

whitepaper -filetype:doc
-filetype:pdf

Meta Tag Search

You can filter results by meta tags and their values using inmeta. Used with the operators ~ or =, inmeta restricts results to required or partial meta tag values in the same way as the requiredfields and partialfields search parameters.

Sample usage:

inmeta:department=Human Resources

There is a 128 character limit for inmeta queries.

The 128 characters includes the inmeta term and metatag name/value:

inmeta:<name>=<value> => 128 characters in total.

This limit also applies to dynamic navigation. That is, the attribute values displayed in the sidebar cannot exceed 128 characters.

Meta Tags for more details.

Number Range Search

To search for documents or items that contain numbers within a range, type your search term and the range of numbers separated by two periods (.. ). You can set ranges for weights, dimensions, prices (dollar currencies only), and so on. Be sure to specify a unit of measurement or some other indicator of what the number range represents.

Sample usage:

pencils $1.50..$2.50

Phrase Search

Search for complete phrases by enclosing them in quotation marks or by connecting them with hyphens or colons. Words marked in this way appear together in all results, exactly as you enter them. Phrase searches are especially useful when searching for famous sayings or proper names.

You can also use the as_epq search request parameter (see as_epq) to submit a phrase search.

Sample usage:

"yellow pages", yellow-pages, yellow:pages
 

All of the above examples return results for yellow pages. Using the hyphen also returns results for yellowpages.

Text Search (one term)

If you precede a query term with intext:, the search appliance restricts the search to documents that contain the search word in the titles or body text of the documents. The search appliance does not search for the query word in the metadata, anchors, or urls.

Sample usage:

intext:google

Text Search (all terms)

If you precede a query term with allintext:, the search appliance restricts the search to documents whose titles or body text contains the search terms. The search appliance does not search for the query words in the metadata, anchors, or urls. Returns only documents that have the search terms in the title or body text of the document.

Sample usage:

allintext:google search

Title Search (one term)

If you precede a query term with intitle:, Google search restricts the results to documents containing that word in the title.

Putting intitle: in front of every word in your query is equivalent to putting allintitle: at the front of your query.

For plain text files, the search appliance displays results using the first 70 KB of the file as the title. Because the document does not have a title, the intitle special query term does not work for plain text files.

Sample usage:

intitle:google

Title Search (all terms)

If you precede a query with allintitle: Google search restricts the results to those with all of the query words in the result title.

For plain text files, the search appliance displays results using the first 70 KB of the file as the title. Because the document does not have a title, the allintitle special query term does not work for plain text files.

Sample usage:

allintitle:google search

URL Search (one term)

If you precede a query term with inurl:, Google search restricts the results to documents containing that word in the result URL. No spaces can come between the inurl: and the following word.

The term inurl works only on words, not on URL components. In particular, it ignores punctuation and uses only the first word following the inurl: operator. To find multiple words in a result URL, use the inurl: operator for each word. Preceding every word in your query with inurl: is equivalent to putting allinurl: at the front of your query.

Sample usage:

inurl:Google search

URL Search (all terms)

If you precede a query with allinurl: Google search restricts the results to those with all of the query words in the result URL.

The term allinurl works only on words, not URL components. In particular, it ignores punctuation. Thus, allinurl: foo/bar restricts the results to page with the words “foo” and “bar” in the URL, but doesn’t require that they be separated by a slash within that URL, that they be adjacent, or that they be in that particular word order. There is currently no way to enforce these constraints.

Sample usage:

allinurl: Google search

Web Document Info

The query prefix info: returns a single result for the specified URL if the URL exists in the index. No other query terms can be specified when using this special query term. Query terms info, link and cache ignore collection restrictions that are specified by the site parameter.

Sample usage:

info:www.google.com

Wildcard Search

If you precede a query with wildcard:, you can search by entering a wildcard pattern instead of the exact spelling of a term. By default, wildcard search is enabled for each front end of the search appliance. However, to use wildcard search, you must ensure that wildcard indexing is also enabled for your search appliance by using the Index > Index Settings page in the Admin Console.

If wildcard indexing is disabled, users will experience search issues in which case the search appliance administrator can disable implicit wildcard search on each front-end.

The search appliance supports two wildcard: operators:

  • *--Matches zero or more characters
  • ?--Matches exactly 1 character

The search appliance is able to consider all words with * as wildcard terms, so users don't need to prepend the wildcard: special operator to a pattern that contains this operator. To enable the search appliance to do this, click the Consider words with * as wildcards by default checkbox on the Search > Search Features > Front Ends > Filters page.

Take note that words that have special characters, such as apostrophes, in them are not matched by wildcard search.

Using wildcards can simplify queries for long names, technical data, pharmaceutical information, or strings where the exact spelling varies or is unknown. A user can search for all words starting with a particular pattern, ending with a particular pattern, or having a particular substring pattern. A wildcard query term must satisfy at least one of the following conditions:

  • A sequence to at least 2 characters at the start of a word, for example: go*
  • A sequence to at least 2 characters at the end of a word, for example: *le
  • A sequence of at least 3 characters anywhere in the word, for example: *ear*

Sample usage:

wildcard:test*
wildcard:?nter

Wildcard search is also supported for metadata queries, but the wildcard: special operator is omitted. For example: inmeta:name*. Also, metadata queries are %-encoded, which affects the form of an inmeta: wildcard query.

Wildcard search is not supported for other common queries, including filetype, inurl, intext, and so on. Also, wildcard search is not supported with Chinese, Japanese, Korean, or Thai.

Back to top

Filtering


Google search provides many ways for you to filter the results that are returned from your search query. In addition to the automatic filtering and language filtering described in this section, the search appliance provides filtering by query parameters (see Search Parameters), query terms (see Query Terms) and meta tags (see Meta Tags), which are documented in their respective sections.

Automatic Filtering

Google uses automatic filtering to ensure the highest quality search results.

Google search uses two types of automatic filters:

  • Duplicate Snippet Filter --If multiple documents contain identical titles as well as the same information in their snippets in response to a query, only the most relevant document of that set is displayed in the results.
  • Duplicate Directory Filter --If there are many results in a single web directory, then only the two most relevant results for that directory are displayed. An output flag indicates that more results are available from that directory.

By default, both of these filters are enabled. You can disable or enable the filters by using the filter parameter settings as shown in the table.

Filter value

Duplicate Snippet Filter

Duplicate Directory Filter

filter=1

Enabled (ON)

Enabled (ON)

filter=0

Disabled (OFF)

Disabled (OFF)

filter=s

Disabled (OFF)

Enabled (ON)

filter=p

Enabled (ON)

Disabled (OFF)

When a search filter is enabled and removes some results, the search results output indicates that results were filtered. See Estimated vs. Actual Number of Results for more information about how a filtered result set is identified and for recommendations for displaying the results.

Although the filter=0 option exists, Google recommends against setting filter=0 for typical search requests, because filtering significantly enhances the quality of most search results.

For queries that contain the site special query term or the as_sitesearch query parameter, automatic filtering does not take place.

When the Google Search Appliance filters results, the top 1000 most relevant URLs are found before the filters are applied. A URL that is beyond the top 1000 most relevant results is not affected if you change the filter settings.

Language Filters

Language filters limit a search to pages in the specified languages. The Google Search Appliance has built-in language filters that detect the language of a query and return appropriate results. You can combine language filters to further restrict search results.

When the search appliance receives a language-restricted search request for which there are no results in the languages specified by a filter, it displays search results in all languages.

This section covers:

Automatic Language Filters

The Google Search Appliance automatically detects the language of each search query and returns results in that language. For example, if a user submits a search query in Hungarian (lang_hu), results are automatically returned in Hungarian.

The algorithm for automatically determining the language of a web document is not customizable. The language of a document is determined primarily by the language used for the majority of the text in the body of the document.

Encoding schemes for the input and output of search requests are also important when you provide international search. For more information on encoding, see Internationalization. For more information on how language filtering works with Simplified Chinese and Traditional Chinese, see Language Filtering for Traditional and Simplified Chinese.

The automatic language filters are:

Language

Automatic Language Filter Name

Arabic

lang_ar

Chinese (Simplified)

lang_zh-CN

Chinese (Traditional)

lang_zh-TW

Czech

lang_cs

Danish

lang_da

Dutch

lang_nl

English

lang_en

Estonian

lang_et

Finnish

lang_fi

French

lang_fr

German

lang_de

Greek

lang_el

Hebrew

lang_iw

Hungarian

lang_hu

Icelandic

lang_is

Italian

lang_it

Japanese

lang_ja

Korean

lang_ko

Latvian

lang_lv

Lithuanian

lang_lt

Norwegian

lang_no

Portuguese

lang_pt

Polish

lang_pl

Romanian

lang_ro

Russian

lang_ru

Spanish

lang_es

Swedish

lang_sv

Turkish

lang_tr

If you want to filter languages other than the above, obtain the language code from ISO 639 (see http://www.loc.gov/standards/iso639-2/php/code_list.php), index a document corpus containing the desired languages, and run tests to determine that the search results are as expected.

Language Filtering for Traditional and Simplified Chinese

The search appliance determines the encoding of a search query and uses that encoding to return search results. For example, if a user enters a search query using Traditional Chinese, the search results are returned in Traditional Chinese. If a query is entered using Simplified Chinese, the results are also in Simplified Chinese. The original encoding of the documents does not affect what is returned. If documents encoded in Traditional Chinese are crawled and a Simplified Chinese query is entered, the documents returned are encoded in Simplified Chinese.

However, if a search query uses characters that are common to both Simplified and Traditional Chinese, the search appliance’s behavior is indeterminate. In some cases, the search appliance detects such queries as Simplified Chinese, but in other cases, the language is detected as Traditional Chinese. One example of a query that returns indeterminate results is the term Hong Kong. To resolve this issue, use the lr parameter to specify whether you want to enforce Traditional Chinese (lang_zh-TW) or Simplified Chinese (lang_zh-CN).

Combining Language Filters

Search requests that use the lr parameter support the Boolean operators identified in the following table in order of precedence.

Boolean Operator

Sample Usage

Description

Boolean NOT [ - ]

-lang_fr

Removes all results that are defined as part of the Language Filter immediately following the - operator. The example lr value would remove all results in French.

Boolean AND [ . ]

gloves.hats

Returns results that are in the intersection of the results returned by the collection to either side of the dot operator. The example restrict value returns results which are in both the “hats” and “gloves” custom collections.

Boolean OR [ | ]

lang_en|lang_fr

Returns results that are in either of the results returned by the collection to either side of the pipe operator (|). The example lr value returns results matching the query that are in either French or English.

Parentheses [ ( ) ]

(gloves).(-(lang_hu|lang_cs))

All terms within the innermost set of parentheses are evaluated before terms outside the parentheses are evaluated. Use parentheses to adjust the order of term evaluation. The example lr value returns all results in the “gloves” custom collection that are not in either the Hungarian or Czech collections.

Spaces are not valid characters in the collection string.

Back to top

Internationalization


To support searching documents in multiple languages and character encodings, Google provides the ie and oe parameters. The ie parameter indicates how to interpret characters in the search request. The oe parameter indicates how to encode characters in the search results. To appropriately decode the search query and correctly encode the search results, supply the correct ie and oe parameters, respectively, in the search request.

When you are providing search for multiple languages, Google recommends using utf8 encoding value for the ie and oe parameters.

Examples

Example 1. The following search request interprets the search query “gloves” using latin1 encoding, searches for English or French results, and returns results using latin1 encoding:

GET /search?q=gloves&client=test&site=test&lr=lang_en|lang_fr&ie=latin1&oe=latin1

Example 2. This request interprets the search query “gloves” using latin2 encoding, searches for results which are not in Hungarian or Czech, and returns results using latin2 encoding:

GET /search?q=gloves&client=test&site=test&lr=(-lang_hu).(-lang_cs)&ie=latin2&oe=latin2

Example 3. This request interprets the search query “gloves” using utf8 encoding, searches for results which are in Simplified or Traditional Chinese, and returns results using utf8 encoding:

GET /search?q=gloves&client=test&site=test&lr=lang_zh-CN|lang_zh-TW&ie=utf8&oe=utf8
For information on language-specific searches that use the lr parameter, see Language Filters.

Character Encoding Values

Here is a list of encoding values that can be used with the parameters ie and oe :

Language

Encoding Value

Alternate Encoding Value

Chinese (Simplified)

gb

GB2312

Chinese (Traditional)

big5

Big5

Czech

latin2

ISO-8859-2

Danish

latin1

ISO-8859-1

Dutch

latin1

ISO-8859-1

English

latin1

ISO-8859-1

Estonian

latin4

ISO-8859-4

Finnish

latin1

ISO-8859-1

French

latin1

ISO-8859-1

German

latin1

ISO-8859-1

Greek

greek

ISO-8859-7

Hebrew

hebrew

ISO-8859-8

Hungarian

latin2

ISO-8859-2

Icelandic

latin1

ISO-8859-1

Italian

latin1

ISO-8859-1

Japanese

sjis

Shift_JIS

Japanese

jis

ISO-2022-JP

Japanese

euc-jp

EUC-JP

Korean

euc-kr

EUC-KR

Latvian

latin4

ISO-8859-4

Lithuanian

latin4

ISO-8859-4

Norwegian

latin1

ISO-8859-1

Portuguese

latin1

ISO-8859-1

Polish

latin2

ISO-8859-2

Romanian

latin2

ISO-8859-2

Russian

cyrillic

ISO-8859-5

Spanish

latin1

ISO-8859-1

Swedish

latin1

ISO-8859-1

Turkish

latin3

ISO-8859-3

Turkish

latin5

ISO-8859-9

Unicode (All Languages)

utf8

UTF-8

Back to top

Sorting


Google search provides three sorting options for search results:

Sort By Relevance (Default)

By default, Google combines hypertext-matching analysis and PageRank technologies to provide users with highly relevant results. Hypertext-matching analysis uses the design of the page, examining over 100 factors to determine the best result for your query term. PageRank considers the link structure of the entire index to understand how each page links to the other pages in the index.

Sort By Date

Google search engine can order search results by date in ascending or descending order. The date of a web document is defined by parameters configured by the search administrator. When a search request uses the sort-by-date feature, the date associated with each result document is used to determine the order of the results. Take note that the search appliance ignores the time of day for sorting, even if it’s given by the “last-modified” date or other attributes.

When using the sort-by-date feature, the built-in filter of duplicate directories and duplicate snippets will group the highest result (newest or oldest depending on the sort order) with similar results regardless of their date. This can be disabled by adding the filter=0 parameter to the search request when performing search by date.

When sorting by date, the order of the results can also be effected by any relevant result biasing policies that are being used. See Using Result Biasing to Influence Result Ranking in Creating the Search Experience.

Example

The following request returns the first 10 top results that match the query “chicken teriyaki” in the “test” collection:

GET /search?q=chicken+teriyaki&output=xml&client=test&site=test&sort=date:D:S:d1

Results are sorted by date and relevancy.

Details

To sort the results by date, include the sort parameter in the search request, formatted as follows:

date:<direction>:<mode>:<format>

The following tables shows the possible values for <direction>, <mode>, and <format>.

<direction> Value

Description

A

Sort results in ascending order.

D

Sort results in descending order.

<mode> Value

Description

S

Return the 1000 most relevant results, sorted by date.

R

Get all results, sort by date, and return the first 1000 results. You can use this option when freshness is more important than relevancy. Do not use this filter if your collection contains more than 50,000 documents.

L

Return the date information for each result. No sorting is done.

<format> Value

Description

d1

The format of the value returned for each search result is set to YYYY-MM-DD.

Sort by Metadata

The Google Search Appliance can order search results by values that are included in individual documents. This makes it possible to sort documents by prices, dates, authors, or any other value that is relevant for your documents. The sorting occurs only on the 1000 most relevant results for the specific query.

When sorting by metadata, the total length of the metadata attr:value pair cannot exceed 121 characters. Exceeding the maximum character limit causes results to be unsorted. For more information, see Meta Tags.

When using the sort-by-metadata feature, the automatic quality filter sometimes re-orders results when performing result grouping. This can be disabled by adding the filter=0 parameter to the search request when performing search by date.

Using this feature will cause search performance to decrease. The performance decrease depends on, but is not limited to, the following factors

  • How many results are returned
  • How much metadata exists for each document
  • The sorting options specified

The value used to sort each document is available in the XML output in the FS tag.

How Sorting Works

When a search request is submitted with the sorting parameter specified as described in the following sections, the Google Search Appliance retrieves the value corresponding to the given meta tag name for each search result. In some instances, as described below, some processing of the value will occur. These values will then be sorted according to the specific parameter specified. If two documents have the same value, they will be ordered according to their original relevance.

  • Multivalued metadata --If you have specified certain meta tags as multivalued in Index > Index Settings, then only the first value in those tags is used for sorting.
  • Multiple meta tags with the given name --If a document has multiple meta tags with the given name, all of which have the exact same value, then that value will be used for sorting. If the multiple meta tags have different values, then none will be used and the document will behave as if it has no meta tag with the given name.
  • No meta tag with the given name for a document--If a document does not contain a meta tag with the given name, then it will be placed after documents that do have meta tags with the given name. All of these documents without such meta tag will be ordered by their original relevancy ranks.
  • Date --If a meta tag has been specified as a date in Index > Index Settings, then the value will be normalized into a YYYY-MM-DD format before being sorted.

Details

To sort results by metadata, include the sort parameter in the search request, formatted as follows:

meta:<name>:<direction>:<mode>:<language>:<case>:<numeric>

All values after the name are optional and can be left blank. For instance, if you want to specify only the name and the language, you could use the following format:

meta:<name>:::<language>
meta:<name>:::<language>::

The following tables show the possible values for the options.

<name> Value

Description

any string

The name of the meta tag that should be used to sort by. This string must be double-URL-encoded.

<direction> Value

Description

A

Sort results in ascending order. Default.

D

Sort results in descending order.

<mode> Value

Description

E

Return the 1000 most relevant results, then sort by metadata. Default.

ED

Same as mode E, but also sort dates chronologically. Supported in GSA version 7.2.0.G.230 and later.

S

Return the 1000 most relevant results, then sort by metadata, then apply Advanced Score Reporting, Unification biasing, and filtering.

SD

Same as mode S, but also sort dates chronologically. Supported in GSA version 7.2.0.G.230 and later.

<language> Value

Description

any ISO 639-1 code

A 2-character language code indicating the language rules to use to sort. en is the default.

<case> Value

Description

D

Do not consider case when sorting. Default.

U

Sort uppercase version of a letter before the lowercase version of that letter. Note that this does not sort all uppercase characters before all lowercase characters.

L

Sort lowercase version of a letter before the uppercase version of that letter.

<numeric> Value

Description

D

Numeric sorting is disabled, so 123 comes before 2 because 1 is less than 2. Default. The order is ascending (the default).

Examples:

123
2
34

ABC123XYZ
ABC2XYZ
ABC34XYZ

Y

Numeric sorting is enabled, so 2 comes before 123 because 2 is less than 123. This only sorts positive integers, but does classify numbers inside longer alphanumeric strings. So ABC2XYZ will come before ABC123XYZ.

Examples:

2
34
123

ABC2XYZ
ABC34XYZ
ABC123XYZ

F

This is similar to Y, but also identifies and sorts negative and floating-point numbers. It will identify proper punctuation based on the language specified, so a decimal point is a . for English, but a , for German.

N

Can be used to sort pure English numbers (only containing digits and +-. punctuation) faster than using Y or F. But values like ABC2XYZ will not be sorted.

For more information about the language options, case options, and numeric options D and Y, see the ICU Collation documentation.

Examples

The following examples show sort parameters and how values would be sorted.

Sorting authors or other alphabetic values

sort=meta:<name>

Agatha Christie
C. S. Lewis
Henry David Thoreau

sort=meta:<name>:D

Henry David Thoreau
C. S. Lewis
Agatha Christie

Change case sorting

sort=meta:<name>:A:E:en:D

aa
aA
Aa
Ab

sort=meta:<name>:A:E:en:U

Aa
aA
aa
Ab

sort=meta:<name>:D:E:en:L

aa
aA
Aa
Ab

sort=meta:<name>:D:E:en:U

Ab
aa
aA
Aa

Sort numbers that have the same format

If all the numbers have the same format (for instance, if every number has a dollar sign, then two digits, then a comma, then three digits, then a period, then two digits: $xx,xxx.xx).

sort=meta:<name>:::::Y

$34,827.45
$84,671.11
$93,243.55

Sort currency

sort=meta:<name>:::::F

$0.01
$1.00
$34
$1,234.56

Sort English-looking numbers

This is a very fast option, if all the numbers are in the following formats.

sort=meta:<name>:::::N

-12345
-1234.56
-3
0
34.9
+35.172
+321
16003.58

Sort dates

sort=meta:<name>::ED

January 30, 2012
February 18, 2013
October 2, 2013

Sort date-like words

Dates in word format are sorted alphabetically.

sort=meta:<name>::E

February 18, 2013
January 30, 2012
October 2, 2013

Back to top

Meta Tags


The Google Search Appliance provides search parameters and special query terms that enable you to leverage the meta tags that are available in your content. These make it possible to find matches specifically in meta data content, rather than content occurring anywhere in the document.

There are no restrictions on the number of meta tag matches a results page can display.

The maximum number of characters for a metadata attr:value pair for the search request parameters requiredfields and partialfields and special query term inmeta is 121. If the combination of the metadata attr:value is bigger than 121 characters then those particular meta tag contents will be visible at serving time although not searchable via the search request. Also, it will not be visible in dynamic navigation.

Take note that if the metatada attr:value contains term values that have less than 121 characters, those terms may still be searchable via the partialfields request parameter and the inmeta special query term using the ~ operator (see the example below).

At search time, if the encoded value of the search attribute (requiredfields, partialfields, inmeta ) plus the attr:value is greater than 121, then the search won't produce any results. For example, the following meta tag value is not searchable because the encoded value of the search attribute plus the combination of attr:value is greater than 121 characters:

<meta name="gamc" content="123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"/>

In the following example, the term 1234 would be searchable using a partialfields or inmeta request, as the attr:value meets the 121 character limit:

<meta name="gamc2" content="1234 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"/>

This section describes the following methods of using meta data:

Requesting Meta Tag Values

Use the getfields parameter in a search request to specify meta tag values to return with the search results. The search engine returns only meta tag information for results that actually contain the meta tags. The search for meta tags is case-insensitive. Use only whole words in the getfields parameter, not partial words or word “stems.” There are limits to the number of characters returned for each meta tag when using getfields. The character limits include the meta tag name and content. These are the limits:

  • For Latin characters: name + value = 1500 characters; chars_AND_name <= 1500/2
  • For characters in multibyte languages (Japanese, Chinese, and Korean): 500 characters

Usage

GET /search?q=[search term]&output=xml&client=test&site=test&getfields=[meta tag name]

Example

The following search request returns the first 10 results that match the query “books” in the “test” collection:

GET /search?q=books&output=xml&client=[test]&site=[test]&getfields=author.title.keywords

If any of the results contain the author, title or keywords meta tags, then the values of those meta tags are returned with the respective results. For example, the following tags could be returned with this search request:


<meta name="author" content="Jakob Nielsen">
<meta name="title" content="Usability Engineering">
<meta name="keywords" content="Usability, User Interface, User Feedback">

Details

To specify multiple meta tag values to be returned, list all meta tag names separated by a period (. ) as in the first example. To request all available meta tags for each search result, specify an asterisk (* ) as the value for the getfields parameter.

When meta tag values are requested, they are not displayed in results requested in the default HTML format. You can use the custom HTML or XML output options, or set the XSLT variable show_meta_tags to display meta tags in results. For more information, see Advanced Customization Topics in Creating the Search Experience.

All specified meta tag names and values must be double URL-encoded (see URL Encoding).

Filtering by Meta Tags

The search appliance can filter results by the values of the results’ meta tags. This section describes how to use the requiredfields and partialfields input parameters to filter results using meta tag values. You can use these parameters to include only search results that contain specified meta tag values.

The term partialfields refers to part of the meta tag content, rather than part of a word. For information on other filtering techniques, see Filtering.

You can use the operators in the following table when filtering by meta tags.

Operator

Description

AND (. )

Include results when both filters are true.

OR (| )

Include results when at least one filter is true.

NOT (Exclusion) (- )

Exclude from the result set any results that contain the specified meta tag condition.

A search can be performed to find all documents containing a set of words and/or metadata, such as A AND B AND C. These terms can also be negated, such as A AND B AND NOT C. The search appliance can also use OR conditions for querying.

Usage


GET /search?q=[search term]&output=xml
                           &client=test
                           &site=test
                           &requiredfields=[meta tag name]:[meta tag content]

The q= parameter is optional when using requiredfields or partialfields parameters, however, the whole query needs to have at least one positive term, be it part of the query or in the metadata restricts.

Examples

Example 1:

The following search request returns the first 10 results that match the query “checks” in the “test” collection and also contain either of the following meta tags (the %2520 operator in the GET statement shows double encoding where %20 (space) is double encoded so that the % character (hexadecimal 25) is appended to the hexadecimal 20):


<META NAME="department" CONTENT="Human Resources">
<META NAME="department" CONTENT="Finance">
GET /search?q=checks&output=xml&client=test
                              &site=test
                              &requiredfields=department:Human%2520Resources|department:Finance

Example 2:

The following search returns the first 10 results that match the query “checks” in the “test” collection that do NOT contain the following meta tag:

<META NAME="department" CONTENT="Engineering"> GET //search?q=checks&output=xml&client=test &site=test &requiredfields=-department:Engineering

Example 3:

The following search request returns the first 10 results that match the query “books” in the “test” collection, and also contain the word “Scott” somewhere in the “author” meta tag. Some example meta tags that satisfy this search request are:


<META NAME="author" CONTENT="Sir Walter Scott">
<META NAME="author" CONTENT="F. Scott Fitzgerald">
GET /search?q=books&output=xml
                  &client=test
                  &site=test
                  &partialfields=author:Scott

Details

Multiple meta tag constraints can be specified using the requiredfields and partialfields parameters. To filter for the presence of a meta tag, indicate the name of the meta tag to be found. To filter on a specific meta tag value, indicate the name of the meta tag followed by the colon “: ” character and then the specific value. The partialfields parameter matches complete words, not parts of words.

To combine multiple name-value pairs, use the following Boolean operators.

  • Boolean OR [ | ]

Returns results that satisfy either meta tag constraint.

Example: department:Sales|department:Finance

  • Boolean AND [ . ]

Returns results that satisfy both meta tag constraints.

Example: author:William.author:Jones

  • Combined OR and AND with [ ( ) ]

Evaluates conditions in parentheses first: (department=Sales OR department=Finance) AND (author=Williams OR author=Jones).

Example: (department:Sales|department:Finance).(author:William|author:Jones)

Boolean operators are left associative with equal precedence. You can use parentheses to change the order of precedence. For example, A . (B | C | D) evaluates the OR (|) operators in the parentheses before the AND (.) operator. It is advisable to use brackets, braces, or parenthesis to clarify the precedence in complex queries.

Nested Boolean Filtering Using Meta Tags

Using the Google Search Appliance, the user can search over the meta tags in documents by writing complex queries using AND, OR, NOT operators nested within each other. Using nested metadata queries gives the user more power with the expressive capabilities of search requests.

Arbitrarily nested boolean queries can be written using requirefields and partialfields in conjunction with AND (.), OR (|), and NOT (-) operators. However, there is no way to specify range search with requirefields and partialfields, as noted. Nested boolean queries cannot be used with inmeta. Because precedence cannot be specified in the search box, when you use inmeta, the normal precedence operators take over and the query is executed. However, a single query can include both inmeta for range search and a nested boolean statement using requirefields and partialfields.

Before executing a search, the search appliance simplifies the search query by pushing NOTs down the query tree. This process is an application of De Morgan’s Laws and Double Negation Elimination. As a result of this process, if there are any NOT nodes in the query, they are just above the leaf nodes.

For example, consider the following query:

NOT (a OR b)

The following simplified query is the result of pushing NOTs down the query tree. The NOT nodes are above the leaf nodes:

(NOT a) AND (NOT b)

Not all combinations of operators are valid for searches. The following queries in a simplified query tree are invalid:

  • A query in which there is any OR node with even a single child as a NOT node.
  • A query in which there is any AND node with all children as NOT nodes.

The following table contains examples of invalid queries.

Query

Simplified Query

Reason

NOT(a OR b OR c)

(NOT a) AND (NOT b) AND (NOT c)

Invalid because AND has 3 children (NOT a, NOT b, NOT c) and all are NOT’ed

a OR b OR NOT (NOT (NOT c))

a OR b OR (NOT c)

Invalid because OR has one child, which is a NOT

Searches with unsupported expressions are not performed and do not return results.

Non-Alphanumeric Characters

By default, non-alphanumeric characters in a partialfields query separate the query terms in the same way as space characters. Generally use spaces as separators even when the original content used different content as a separator. For example if you were trying to do a partialfields query for the following meta tag:

<meta name="part" content="aaa-bbb+ccc*ddd-fff">

You should use queries like:

partialfields=part:aaa%20bbb
partialfields=part:bbb%20ccc

The following non-alphanumeric characters are exceptions:

Character

Description

Decimal point (.)

A double URL-encoded (see URL Encoding) decimal point can act as a decimal point in a number (for example, 250.01). For example to query for a meta tag like:

<meta name="number" content="1.1222">

Use a partialfields query like:

partialfields=number:1%252E1222

When a meta tag contains a decimal point with no numbers use the space as a separator as previously mentioned. For example for a meta tag like this:

<meta name="pet" content="dancing.parrot">

Use a partialfields query like (%2520 is a double URL-encoded space character):

partialfields=pet:dancing%2520parrot

If a meta tag contains a number that has letters immediately before or after it, a space should be used as a separator. For example, in the meta tag:

<meta name="serialnumber" content="A1.2"

Use a partialfields query like:


partialfields=serialnumber:A1%202

Ampersand (&)

Not treated as a separator. For example for the meta tag:


<meta name="letters" content="a&b">

Use a partialfields query like this (%2526 is a double URL-encoded ampersand character):


partialfields=letters:a%2526b

Underscore (_)

Not treated as a separator. For example for the meta tag:


<meta name="letters" content="a_b">

Use a partialfields query like this:


partialfields=letters:a_b\

Using inmeta to Filter by Meta Tags

The special query term inmeta provides meta tag filtering directly from the search box. In combination with simple operators, inmeta filters by meta tags in the same way as the requiredfields or partialfields search parameters. You can further refine inmeta filtering using the double-period (..) separator and the daterange query term to search by number and date range. (For more information, see Query Terms.)

The special query term inmeta and relevant search parameters map to each other in this way:

inmeta Syntax

Search Parameter Syntax

Description

inmeta:[meta tag name]

&requiredfields=[meta tag name]

Returns results that contain the specified meta tag.

inmeta:[meta tag name]=[meta tag content]

&requiredfields=[meta tag name]:[meta tag content]

Returns only results that match the exact meta tag content value specified.

[meta tag name]~[meta tag content]

&partialfields=[meta tag name]:[meta tag content]

&requiredfields=[meta tag name]?[meta tag content]

Returns results that have the specified meta tag with a value that matches some or all of the specified meta tag content (that is, the partial value).

inmeta:[meta tag name]~[partial value]*

&requiredfields=[meta tag name]:[partial value]*

Returns results that have the specified meta tag name with a value that matches the wildcard search for all of the specified meta tag content (that is, content starting with a match on the partial value).

inmeta:[meta tag name]~*[partial value]*

&requiredfields=[meta tag name]?*[partial value]*

Returns results that have the specified meta tag name with a value that matches the wildcard search for some or all of the specified meta tag content (that is, content that includes a match on the partial value).

Usage Notes:

  1. By default documents that contain ALL query terms are returned. This behavior is similar to a boolean AND. Note though that there is no AND query term. It is the default way of processing query terms. The default behavior can be changed by using the boolean or query term OR or the boolean not query term ‘-’. Also note that it is not possible to use the NOT operator in an OR statement, for example test OR -test1. Also, there is no way to do nesting of boolean logic using parenthesis.
  2. The OR keyword separating query terms in which a date or numeric range appears returns inconsistent results.

    Examples:

    The following example returns one result when each portion of the query already returns one different result:

    inmeta:TainoParrot6:1..244227 OR inmeta:TainoParrot6=244228

    The following example returns two results and both are correct:

    inmeta:TainoParrot6=244227 OR inmeta:TainoParrot6=244228

    The following example returns 112 results when empty alone returns 112 results and the number range query returns 3 results:

    empty OR inmeta:TainoParrot6:244227..244229

    The following example returns 113 results and is correct:

    empty OR inmeta:TainoParrot6=244228

    The following example returns three results when yvette alone returns the same results and no results from the date range query appear:

    yvette OR inmeta:TainoParrot6:1..244228
  3. An OR of two inmeta range terms does not return results.

    If a set of documents each contain a meta tag with numerical content declared, whether in fixed point notation (with a period) or integer notation, two inmeta range searches that return results in isolation do not return results when combined with an OR. For example, if one document contains <META NAME="price" VALUE="20.00"> and another contains <META NAME="price" VALUE="40.00">, the search inmeta:price:15..25 returns the first document, while the search inmeta:price:35..45 returns the second document. However, the search inmeta:price:15..25 OR inmeta:price:35..45 does not return results.

    If you have two inmeta range searches that in isolation return result sets that overlap, combining them with AND returns the intersection of those sets correctly, regardless of the notation used for the meta tag content or the range search itself.

  4. An inmeta search for a number range returns results only when a number contains six or fewer digits.

    For example, if a document contains a meta tag of <meta name="NumDateRange" content="20081230">, then a search query of inmeta:NumDateRange=20081230 works correctly, or a search where the six significant digits are respected, such as querying for inmeta:NumDateRange=1..101230. You can use a six digit number for dates with two digits for the year, two digits for the month, and two digits for the day. If a search is made where the range includes more than six digits, then no results occur, such as with inmeta:NumDateRange=20081201..20101231.

  5. An inmeta search for a number range is unable to handle negative numbers and ignores them.
  6. An inmeta search is unable to search by multiple keywords or perform phrase searches.

    For example, consider the following meta tags:

    <meta name="department" content="Human Resources">
    <meta name="department" content="Finance">

    The following query does not work correctly:

    checks inmeta:department=Human+Resources+OR+checks inmeta:department=Finance

    Instead, use multiple inmeta query terms, for example:

    inmeta:department~Human OR inmeta:department~Finance
  7. Special characters in metadata names must be escaped for use in inmeta.

    For example, to match metadata tag <meta name="s.pos" content="orange" />, the following query will not work:

    inmeta:s.pos~orange

    You must use the following query, in which the special character is escaped:

    inmeta: s%2Epos~orange

  8. An inmeta search of meta text with special characters, such as “.” and using the operator “~” doesn’t work, but using operator “=” with the full meta text does work.
  9. When using daterange or inmeta queries, spelling suggestions are not returned.

    To view spelling suggestions, use the requiredfields parameter instead of inmeta.

  10. When the search appliance indexes a Microsoft Office 2007 Word document, the following metadata in meta tags becomes available for inmeta search queries:
    
    <meta name="Author" content="Polly Hedra"></meta>
    <meta name="Keywords" content="Resume"></meta>
    <meta name="last saved by" content="Ray Polanco"></meta>
    <meta name="revision number" content="1"></meta>
    <meta name="last print date" content="5/27/2009 14:03:00"></meta>
    <meta name="creation date" content="4/27/2009 13:15:00"></meta>
    <meta name="Last Saved Date" content="4/27/2009 13:44:00"></meta>
    <meta name="template" content="Taino Parrot Resume Template.dotx"></meta>
    <meta name="edit minutes" content="23"></meta>
    <meta name="page count" content="3"></meta>
    <meta name="word count" content="220"></meta>
    <meta name="character count" content="1512"></meta>
    <meta name="source" content="Microsoft Office Word"></meta>
    <meta name="security" content="0"></meta>
    <meta name="Count Lines" content="12"></meta>
    <meta name="Count Paragraphs" content="3"></meta>
    <meta name="Scale Crop" content="no"></meta>
    <meta name="company" content="Coqui Parrot Inc."></meta>
    <meta name="links up to date" content="no"></meta>
    <meta name="Count Characters with Space" content="1729"></meta>
    <meta name="shared doc" content="no"></meta>
    <meta name="Links Dirty" content="no"></meta>
    <meta name="Application Version" content="12.0000"></meta>
    
  11. Metadata can have multiple attributes with the same name. For example:
    
    <metadata>
      <meta name="Name" content="Jenny Wong"/>
      <meta name="Phone" content="x12345"/>
      <meta name="Phone" content="x789"/>
      <meta name="Floor" content="3"/>
    

    If multiple values are available and if any of the attribute values match the search query, a link to the document appears in the search results.

  12. While inmeta supports wildcard search, it does not support boolean logic. Use requiredfields instead to combine wildcard search and boolean logic.

Examples

Example 1. These first query examples show how search requests are related to meta tags in the following example of a web page.


<html>
   <head>
      <title>My Title</title>
      <meta name="myFloat" content="1.23456">
      <meta name="myInteger" content="8" />
      <meta name="myCurrency" content="123.45" />
      <meta name="myDate" content="2011-03-05" />
   </head>
   <body>
      Hello world.
   </body>
</html>

The following search request is for a match to the lower bound value within the currency range:

inmeta:mycurrency:60.00999..

The following search request is for an exact match to the float value:

inmeta:myfloat:1.23456..1.23456

The following search request is for an exact match to a date or date range:


inmeta:mydate:2011-03-05..2011-03-05
- or -
inmeta:mydate:daterange:2011-03-05..2011-03-05

Example 2. The following search request returns results that contain the word “Scott” somewhere in the “author” meta tag. Some example meta tags that satisfy this search request are:


<meta name="author" content="Sir Walter Scott">
<meta name="author" content="F. Scott Fitzgerald">
books inmeta:author~Scott

Example 3. The following search request returns results that contain “size” meta tag values between 30 and 50 inches:

flat+panel+TV inmeta:size:30..50

Example 4. The following is an open-ended date range search request that returns results containing “date” meta tag values later than 2007-01-01:

Monica inmeta:date:daterange:2007-01-01..

Date meta tags must contain only the date information. If you want to filter by date meta tags, make sure the meta tag content fields do not contain any information other than a date.

Limitations

For information about search request limitations, see Specifications and Usage Limits.

Back to top

Was this article helpful?
How can we improve it?