Search Protocol Reference

Results Format

Back to top

Custom HTML


This section describes the custom HTML results.

Custom HTML Output Overview

Google Search Appliance has a built-in XSLT (eXtensible Stylesheet Language Transformation) server, and can generate custom HTML using your XSL stylesheet. Search requests that include the output parameter set to xml_no_dtd and a valid proxystylesheet parameter value are automatically processed by the XSLT server as requests for custom HTML output.

Using the XSL stylesheet specified by the proxystylesheet parameter, the XSLT server applies the transformation rules found in the XSL stylesheet to the standard Google XML results. Although this document assumes that the output generated by applying the XSL stylesheet is HTML, almost any output format can be generated by using appropriate XSL stylesheet rules. For any front end, the default XSL stylesheet can be customized or replaced by the search administrator.

To customize the XSL stylesheet used to generate custom HTML output, see XML Output to determine the XML tags that may be transformed using a customized XSL stylesheet.

Additionally, you can leverage the proxycustom parameter to pass custom XML tags to the XSLT server. Because including custom XML does not generate search results, this feature is useful for implementing additional static search pages, such as an advanced search page.

Customizations to XSLT stylesheets may result in vulnerability to cross-site scripting (XSS) attacks. Google recommends that you run XSS test after customizing an XSLT stylesheet.

Notes:

  • XSL stylesheets used by the XSLT server are cached for 15 minutes. To force the XSLT server to use the latest version of an XSL stylesheet, set the proxyreload input parameter to a value of 1 in your search request.
  • XSL stylesheets that include other files may not be used with the Google search engine. An XSL stylesheet that contains the following tags generates an error result:
    • <xsl:import>
    • <xsl:include>
    • xmlns:
    • document()
  • When you request cached results in custom HTML output, the BLOB XML tag and associated value are automatically converted to the original text before the XSL stylesheet rules are applied. When using an XSL stylesheet that customizes cache results, simply use the values of the CACHE_LEGEND_TEXT, CACHE_LEGEND_NOTFOUND and CACHE_LEGEND_HTML XML tags directly instead of applying a rule on the BLOB subtag.
  • If you use input or output encodings other than latin1, see Internationalization for more details.
  • More information about XSL and XSLT can be found on the W3C (http://www.w3.org/Style/XSL/ ) web site.

Internationalization

The Google Search Appliance handles over 20 character encoding schemes. This section discusses special considerations for the custom HTML output format with encoding schemes other than latin1 .

To support all the encoding schemes supported by Google, the XSLT server follows a process to ensure that the results are returned in the correct encoding scheme. When requesting search results through the XSLT server, the server translates the results to the UTF8 encoding scheme before applying the selected XSL stylesheet. After the XSL stylesheet rules are applied to generate the results, the results are converted to the encoding scheme that is specified by the output encoding parameter, oe. The default output encoding, if the parameter oe is missing, is ISO-8859-1. The one exception to this rule is cached result pages, which get converted to the encoding scheme of the cached document after XSLT processing.

Each front end for your search appliance is associated with an underlying stylesheet. All XSL stylesheets must be in latin1 or UTF8 formats.

Back to top

XML Output


The description of the XML results format contains the following sections:

Back to top

XML Output Overview

For maximum flexibility, Google provides search results in XML format. Using the Google XML results, you can use your own XML parser to customize the display for your search users. If you are using an XSL stylesheet to transform the XML results instead of developing your own XML parser, proceed to Custom HTML.

Notes:

  • Element values are valid HTML and are suitable for display, unless otherwise noted in the XML tag definitions. Some values are URLs and must be HTML-encoded to be displayed.
  • To remain forward-compatible, your XML parser that parses Google search results should ignore attributes or tags that are not documented. By ignoring unknown tags, your custom XML parser can continue working without modification when Google adds more features to the XML output in the future.
  • For custom parameters that contain spaces, each space is replaced with “_”. You can still retrieve the unmodified value from the original_value attribute. For example:
<param name="temp" value="token_ring" original_value="token+ring" />

Character Encoding Conventions

The first line of the XML results indicates which character encoding is used. See XML Standard for information about character encoding (http://www.w3.org/TR/1998/REC-xml-19980210#charencoding ).

Certain characters must be escaped when they are included as values in XML tags. These characters are documented in XML Standard (http://www.w3.org/TR/1998/REC-xml-19980210#dt-escape ), and are shown in the table that follows. All other characters in the XML results are presented without modification.

Character

Escaped Form

<

either &lt; or &#60;

&

either &amp; or &#38;

>

either &gt; or &#62;

either &apos; or &#39;

"

either &quot; or &#34;

Google XML Results DTD

Google XML results can be returned with or without a reference to the most recent DTD (Document Type Definition) describing Google’s XML format. The DTD is a guide to help search administrators and XML parsers understand the XML results output. Because Google’s XML grammar may change from time to time, do not configure your parser to use the DTD to validate the XML results.

XML parsers should not be configured to fetch the DTD every time a search request is performed. Because the DTD is updated infrequently, these fetches create unnecessary delay and bandwidth requirements.

To get results in XML output format, use one of the following parameters in the search request:

  • output=xml_no_dtd (recommended), or
  • output=xml

When you use the xml output format, the XML results include the line:

<!DOCTYPE GSP SYSTEM "google.dtd">

The DTD is available on the Google Search Appliance at http://<appliance_hostname>/google.dtd.

Google XML Tag Definitions

This section contains an index of Google’s XML tags.

Subtags legend:

?

zero or one instance of the subtag

*

zero or more instances of the subtag

+

one or more instances of the subtag

|

Boolean OR

BLOB

Format/Parent

Text (See Definition)

CACHE_HTML, CACHE_LEGEND_NOTFOUND, See CACHE_LEGEND_TEXT

Subtags

None

Definition

This tag contains HTML data in the encoding format that is specified in the attribute. The data is Base64 encoded to preserve the data integrity of cached results that are encoded in a different encoding scheme than the requested results.

Attributes

Name

Format

Description

encoding

Text (Encoding Scheme)

The encoding scheme of the HTML data

(See "Internationalization" for a list of common encoding values)

C

Format/Parent

HAS

Subtags

None

Definition

Indicates that the “cache:” special query term is supported for this search result URL.

Cached results are suppressed and this element is not returned if the <head> tag of the document contains the following <meta> tag: <meta name="ROBOTS" value="noarchive">.

Attributes

Name

Format

Description

SZ

Text (Integer + “k”)

Provides the size of the cached version of the search result in kilobytes (“k”). This field is not populated if no cached version of a document is available, which can be the case if robots “noarchive” meta tags are used.

CID

Text

Identifier of a document in the Google Search Appliance cache. To fetch the document from the cache, send a search term of the form:

"cache:" + CID text + ":" + encoded URL.

The encoded URL is available in the UE tag. Send this search term normally, as you would type it into the search form.

ENC

Text

The encoding of the document in the cache. See Internationalization for a list of common values.

CACHE

Format/Parent

GSP

Subtags

CACHE_URL, CACHE_REDIR_URL, CACHE_LAST_MODIFIED, See CACHE_LEGEND_FOUND?, CACHE_LEGEND_NOTFOUND?, CACHE_CONTENT_TYPE, CACHE_LANGUAGE, CACHE_ENCODING, CACHE_HTML

Definition

Encapsulates the cached version of a search result.

Attributes

None

CACHE_CONTENT_TYPE

Format/Parent

Text (MIME type)

CACHE

Subtags

None

Definition

MIME type of the cached result, as specified in the HTTP header that is returned when the document is crawled.

Attributes

None

CACHE_HTML

Format/Parent

Text (HTML) (Custom HTML output only)

CACHE

Subtags

See BLOB? (XML output only)

Definition

The cached version of the search result. All search results are stored in HTML format.

Attributes

None

CACHE_ENCODING

Format/Parent

Text

CACHE

Subtags

None

Definition

The encoding scheme of the cached result, as specified in the HTTP header that is returned when the document is crawled. (See Internationalization for a list of common values.)

Attributes

None

CACHE_LANGUAGE

Format/Parent

Text (Google language tag)

CACHE

Subtags

None

Definition

The language of the cached result as determined by Google’s automatic language classification algorithm. The value of this tag is the same as the values used for the automatic language collections without the “lang_” prefix (see "Automatic Language Filters").

Attributes

None

CACHE_LAST_MODIFIED

Format/Parent

Text

CACHE

Subtags

None

Definition

Date that the document was crawled, as specified in the Date HTTP header when the document was crawled for this index. The crawler fetches documents from its cache if the web server responds with a 304 (not modified) status code to an if-modified-since request. In this case, the CACHE_LAST_MODIFIED is the date when the document was originally crawled and not the date of the if-modified-since request.

Attributes

None

CACHE_LEGEND_FOUND

Format/Parent

CACHE

Subtags

See CACHE_LEGEND_TEXT*

Definition

Encapsulates query terms that are found in the visible text of the cached result returned.

Attributes

None

CACHE_LEGEND_NOTFOUND

Format/Parent

Text (Custom HTML output only)

CACHE

Subtags

See BLOB? (XML output only)

Definition

Details of any query terms that are not visible in the cached result returned.

Attributes

None

CACHE_LEGEND_TEXT

Format/Parent

Text (Custom HTML output only)

See CACHE_LEGEND_FOUND

Subtags

See BLOB (XML output only)

Definition

Details of a query term that is visible in the cached result. Query terms found in the cached result are automatically highlighted using the colors described in the attributes of this tag.

Attributes

Name

Format

Description

fgcolor

Color attribute

The foreground color of the query term in the cached result. This value can be used directly in a color attribute for HTML tags.

bgcolor

Color attribute

The background color of the query term in the cached result. This value can be used directly in a color attribute for HTML tags.

CACHE_REDIR_URL

Format/Parent

Text (Absolute URL)

CACHE

Subtags

None

Definition

Final URL of cached result after all redirects are resolved.

Attributes

None

CACHE_URL

Format/Parent

Text (Absolute URL)

CACHE

Subtags

None

Definition

Initial URL of cached result.

Attributes

None

CRAWLDATE

Format/Parent

Text

R

Subtags

None

Definition

An optional element that shows the date when the page was crawled. It is shown only for pages that have been crawled within the past two days.

Attributes

None

CT

Format/Parent

HTML

GSP

Subtags

None

Definition

Search comments.

Example comment: Sorry, no content found for this URL

Attributes

None

CUSTOM

Format/Parent

GSP

Subtags

(Custom XML specified in the search request)

Definition

Encapsulates custom XML tags that are specified in the proxycustom input parameter.

Attributes

None

ENT_SOURCE

Format/Parent

R

Subtags

None

Definition

Identifies the application ID (serial number) of the search appliance that contributes to a result.

Example:

<ENT_SOURCE>T5-KUB000F0ADETLA</ENT_SOURCE>

Attributes

None

ENTOBRESULTS

Format/Parent

GSP

Subtags

OBRES

Definition

Encapsulates the results returned by OneBox modules.

Attributes

None

FI

Format/Parent

RES

Subtags

None

Definition

Indicates that document filtering was performed during this search.

"Automatic Filtering" for more details

Attributes

None

FS

Format/Parent

R

Subtags

None

Definition

Additional details about the search result.

Attributes

Name

Format

Description

NAME

Text

Name of the result descriptor

VALUE

Text

Value of the result descriptor

GD

Format/Parent

Text (HTML)

GM

Subtags

None

Definition

Contains the description of a KeyMatch result.

Attributes

None

GL

Format/Parent

Text (URL)

GM

Subtags

None

Definition

Contains the URL of a KeyMatch result.

Attributes

None

GM

Format/Parent

GSP

Subtags

GL, GD?

Definition

Encapsulates a single KeyMatch result.

Attributes

None

GSP

Format/Parent

This is the root element.

Subtags

(CT?, CUSTOM?, ENTOBRESULTS, GM*, PARAM+, Q, RES?, Spelling?, Synonyms?, TM) | CACHE

Definition

GSP = “Google Search Protocol”

Encapsulates all data that is returned in the Google XML search results.

Attributes

Name

Format

Description

VER

Text

Indicates version of the search results output. The current output version is “3.2”.

HAS

Format/Parent

R

Subtags

L?, C?

Definition

Encapsulates special features that are included for this search result.

Attributes

None

HN

Format/Parent

Text (URL-encoded web directory, see "Appendix B: URL Encoding")

R

Subtags

None

Definition

Indicates that filtering has occurred and that additional results are available from the directory where this search result was found. The value of this tag is ready to be used with the site: query term (see "Directory Restricted Search").

Attributes

Name

Format

Description

U

Text

Server and path components of the directory’s URL.

L

Format/Parent

HAS

Subtags

None

Definition

Indicates that the “link:” special query term is supported for this search result URL.

Attributes

None

LANG

Format/Parent

Text

R

Subtags

None

Definition

Indicates the language of the search result. The LANG element contains a two-letter language code. See "Automatic Language Filters" for language codes.

Attributes

None

M

Format/Parent

Text (Integer)

RES

Subtags

None

Definition

The estimated total number of results for the search.

The estimate of the total number of results for a search can be too high or too low. See "Appendix A: Estimated vs. Actual Number of Results".

Attributes

None

MT

Format/Parent

R

Subtags

None

Definition

Meta tag name and value pairs obtained from the search result.

Only meta tags (see "Meta Tags") that are requested in the search request are returned.

Attributes

Name

Format

Description

N

Text

Name of the meta tag

V

Text

Value of the meta tag

NB

Format/Parent

RES

Subtags

PU?, NU?

Definition

Encapsulates the navigation information for the result set.

The NB tag is present only if either the previous or additional results are available.

Attributes

None

NU

Format/Parent

Text (Relative URL)

NB

Subtags

None

Definition

Contains a relative URL pointing to the next results page.

The NU tag is present only when more results are available.

Attributes

None

OBRES

Format/Parent

ENTOBRESULTS

Subtags

The contents of the OBRES element are provided by the OneBox module, and must conform to the OneBox Results Schema. See the specific OneBox module’s documentation for details. See also the Google OneBox for Enterprise Developer’s Guide.

Definition

Encapsulates a result returned by a OneBox module.

Attributes

None

OneSynonym

Format/Parent

HTML

Synonyms

Subtags

None

Definition

A related query for the submitted query, in HTML format.

Attributes

Name

Format

Description

q

Text

The URL-encoded version of the related query (see "Appendix B: URL Encoding")

PARAM

Format/Parent

GSP

Subtags

None

Definition

The search request parameters that were submitted to the Google Search Appliance to generate these results.

Attributes

Name

Format

Description

name

Text

Name of the input parameter

value

HTML

HTML-formatted version of the input parameter value

original_value

Text

Original URL-encoded version of the input parameter value (see "Appendix B: URL Encoding")

PARM

Format/Parent

RES

Subtags

PC, PMT*

Definition

Encapsulates all dynamic navigation results.

Attributes

None

PC

Format/Parent

Text (Integer 0 or 1)

PARM

Subtags

None

Definition

Indicates whether the counts are exact or partial. 0-exact, 1-partial.

By default, the search appliance verifies the relevance of up to 30,000 documents for public searches and verifies the relevance and authorization of up to 10,000 documents for secure searches (for the purpose of creating the facets). If the search appliance detects that there are more documents in the index, the value is 1.

PMT

Format/Parent

PARM

Subtags

PV+

Definition

Encapsulates results for one attribute. A maximum of 5k values (PV) are returned after sorting all by count or value as configured and discarding the rest.

Attributes

Name

Format

Description

NM

Text

Metatag name

DN

Text

Display name

IR

Text (Integer)

Attribute is range type (1) or not (0)

T

Text (Integer)

Attribute type: 0-String, 1-Integer, 2-Float, 3-Currency, 4-Date

PU

Format/Parent

Text (Relative URL)

NB

Subtags

None

Definition

Contains relative URL to the previous results page.

The PU tag is present only if previous results are available.

Attributes

None

PV

Format/Parent

PMT

Subtags

None

Definition

Encapsulates one value count information.

Attributes

Name

Format

Description

V

Text

Value (empty for range attributes)

L

Text

Contains low range value (empty for non-range attribute)

H

Text

Contains high range value (empty for non-range attribute)

C

Text (Integer)

Doc count matching this value or under this range

Q

Format/Parent

HTML

GSP

Subtags

None

Definition

The search query terms submitted to the Google search appliance to generate these results.

Attributes

None

R

Format/Parent

RES

Subtags

CRAWLDATE, FS?, HAS, HN?, LANG, MT*, RK, S?, T?, U, UD, UE

Definition

Encapsulates the details of an individual search result.

Attributes

Name

Format

Description

N

Text (Integer)

The index number (1-based) of this search result.

L

Text (Integer)

The recommended indentation level of the results. This value is 1 unless Duplicate Directory Filtering occurs (see "Automatic Filtering"). In this case, the second directory result has a value of 2.

MIME

Text

The MIME type of the search result.

RES

Format/Parent

GSP

Subtags

FI?, M, NB?, PARM?, R*, XT?

Definition

Encapsulates the set of all search results.

Attributes

Name

Format

Description

SN

Text (Integer)

The index (1-based) of the first search result returned in this result set.

EN

Text (Integer)

Indicates the index (1-based) of the last search result returned in this result set.

RK

Format/Parent

Text (Integer in the range 0-10)

Subtags

None

Definition

The RK parameter assigns a ranking score to each page on a scale from 0 (least important) to 10 (most important) based on how well the result matches the query. When search results are sorted by relevancy, the RK value is in decreasing order (highest to lowest).

To see the RK values, you must view search results in raw XML, as described in the following steps:

  1. On the search page, enter a query and get results.
  2. If not already selected, click on Sort by relevance.
  3. On the Advanced Search page, edit the query parameters:
  4. Change the output parameter to &output=xml
  5. Remove &proxystylesheet=default_frontend
  6. Add &getfield=*
  7. Renter the query.

The XML results show the RK parameter for each result, for example: <RK>10</RK> .

Attributes

None

S

Format/Parent

Text (HTML)

R

Subtags

None

Definition

The snippet for the search result.

Query terms appear in bold in the results. Line breaks are included for proper text wrapping.

In documents larger than 300KB, snippets may not contain query terms that occur beyond the first 300KB of the document. For non-HTML documents, the 300KB limit applies to the converted version, not the original document.

Attributes

None

SCOREBIAS

Format/Parent

Text (XML)

R

Subtags

None

Definition

The SCOREBIAS tag can appear zero or more times as a child of the R tag (see R) for each result. The SCOREBIAS tag appears for each result biaser that is applied.

The NAME attribute is the name of the result biaser.

The VALUE attribute indicates the effect of the biaser. For biasers where the strength is expressed symbolically, such as source or collection biasing and metadata biasing.

The search appliance does not include any information about the exact change in score or rank, or the weight of the result biaser.

The following example indicates a medium increase in the PatternScorer result biaser:

<SCOREBIAS NAME="PatternScorer" VALUE="2">

Attributes

Attribute

Value

Format

Description

NAME

PatternScorer

Text

Used for both source biasing and collection biasing.

DateBias

Text

Used for date biasing.

Metadata

Text

Used for metadata biasing.

VALUE

3

Text

For a strong increase.

2

Text (integer)

For a medium increase.

1

Text (integer)

For a weak increase.

0

Text (integer)

For no change.

-3

Text (integer)

For a strong decrease.

-2

Text (integer)

For a medium decrease.

-1

Text (integer)

For a weak decrease.

For biasers that do not use a symbolic change, such as date biasing, VALUE has these numerical values:

  • 1 for an increase in score
  • -1 for a decrease in score
  • 0 for no change (probably won’t ever see this)

Spelling

Format/Parent

GSP

Subtags

See Suggestion+

Definition

Encapsulates alternate spelling suggestions for the submitted query. Only one spelling suggestion is returned at this time.

Attributes

None

Suggestion

Format/Parent

HTML

Spelling

Subtags

None

Definition

An alternate spelling suggestion for the submitted query, in HTML format.

Attributes

Name

Format

Description

q

Text

The spelling suggestion.

qe

Text

Internal-only attribute of the spelling suggestion. This attribute works when the search results are transformed on the search appliance, but not on external parsers.

Synonyms

Format/Parent

GSP

Subtags

See OneSynonym+

Definition

Encapsulates the related queries for the submitted query. Up to 20 related queries may be returned, depending on the related queries list that is associated with the front end.

Attributes

None

T

Format/Parent

Text (HTML)

R

Subtags

None

Definition

The title of the search result.

Attributes

None

TM

Format/Parent

Text (Floating-point number)

GSP

Subtags

None

Definition

Total server time to return search results, measured in seconds.

Attributes

None

U

Format/Parent

Text (Absolute URL)

R

Subtags

None

Definition

The URL of the search result.

Attributes

None

UD

Format/Parent

Text (URL to display for non-ASCII URLs)

R

Subtags

None

Definition

The URL string to display when the URL that is in the U parameter is non-ASCII. Displays UTF-8 characters and IDNA domain names properly.

Attributes

None

UE

Format/Parent

Text (URL-encoded version of the URL)

R

Subtags

None

Definition

The URL-encoded version of the URL that is in the U parameter.

Attributes

None

WWN

Format/Parent

Text (Integer)

R

Subtags

None

Definition

Displays only if the search is wildcard related.

Attributes

Value

Format

Description

0

Text (integer)

The wildcard search completed correctly.

1

Text (integer)

The initial query matched too many documents(10k+).

2

Text (integer)

The expansions found for the requested wildcard included too many terms.

3

Text (integer)

No expansions were found.

XT

Format/Parent

RES

Subtags

None

Definition

Indicates that the estimated total number of results specified in this search result is exact.

See "Automatic Filtering" for more details.

Attributes

None

Back to top

Was this helpful?
How can we improve it?