Deployment Scenario Handbook

Relevancy Testing

Scenario overview

Acme Inc. has deployed its GSA and integrated the following content sources:

  • Livelink content
  • Crawled intranet site
  • People directory application.

In the use case for this scenario, they want to conduct relevancy testing to make sure their users are satisfied with the search results being returned by the GSA before rolling out the search solution to production.


Ensure search results returned to users are relevant according to their search terms.


  • Content has already been integrated into the GSA and is available in the index.
  • Test planners know the business context of content in the GSA’s index.

Key considerations

  • Relevancy is hard to frame in terms of an absolute scientific measurement—it may mean different things to different people.
  • The GSA’s out-of-the-box relevancy algorithms have been shown to return highly relevant search results without performing any tweaks or modifications.

Recommended approach

Google’s recommended approach for relevancy testing covers the following areas:

Test case preparation

  • Identify different user groups from the organization, who will utilize search.
  • Based on what is in the GSA index, determine some business context about the type of searches different users would perform and what documents they would expect to be returned.
  • Develop a list of predetermined queries you will have users execute in order to comment on relevancy of results. In addition to the fixed set of queries, ask users to execute 3 or so of their own queries during the testing rounds to account for context not considered in test case preparation.
  • Identify a set of documents you would deem to be most relevant for a particular query for a particular user, which will be used for scoring.
  • Develop a scale that can be used by a tester to gauge how relevant their returned results are and communicate the scale and its basis to the users performing testing. For example, consider a 1-5 scale where:
    • 1—relevancy is great. The first page returns all extremely relevant results. The identified document for this particular query is returned on the first page of results.
    • 5—relevancy is very poor. The results I am expecting don’t come back in the first couple pages of search results. The identified relevant document for this query doesn’t appear until the 60th result. There is one content source crowding out all other results.

Test case execution

  • Before performing any relevancy tweaks, develop a relevancy benchmark by executing the fixed set of predetermined queries by an identified, beta, user set that spans different department/business units, which will eventually be using the search solution in Production.
  • In a spreadsheet tally, ask users to rate each of the results for the executed queries according to the pre-determined scale. Also have users enter their general comments for each search.
  • After taking a benchmark with the default relevancy configuration on the GSA (no synonyms for query expansion, no biasing policy, etc.), tweak the relevancy configuration systematically based on user feedback/comments and have users re-test and re-score after each round of changes to see how the previous change affected user relevancy perception.

Features to consider for relevancy tuning

The following table lists GSA features to consider for relevancy tuning.

Feature Comment
Source biasing By using pattern matching, bias one source over another.
Date biasing, metadata biasing Bias documents, which have specific metadata attached.
KeyMatches Use KeyMatches to promote documents for certain queries.
Query expansion Use a query expansion policy to expand search queries terms into other terms (synonyms).
Self-Learning scorer When Advanced Search Reporting is enabled, the GSA uses the self-learning Scorer feature to analyze click stream data and promote certain search results over time. As an example, for a given search query, if users consistently click the second result on the page instead of the first, that result will eventually move up to overtake the first position on the page.
Host crowding/filtering GSA filters out any combination of:
  • Results from the same path
  • Results with duplicate titles and snippets
Ranking framework Specify a per-URL biasing.Note that this is a very complex solution to manage and should only be tried as a last resort.
Stopwords (introduced in GSA 6.10) Use stopwords to prevent certain terms in the query from being used in performing search. Take care when using this feature as this can have wide ranging implications if used as a solution to a particular problem.
Collections Break content into different collections to restrict the document corpus available to a search query.
Exposing metadata and/or Entities in Dynamic Navigation for a richer user experience Instead of tuning GSA relevance, consider enriching the user experience by adding additional dynamic navigation categories for metadata sources or Entities that were defined in Entity Recognition. Although not really a relevancy tuning option, Dynamic Navigation may have the benefit of enriching user experience so that a user can drill down into the results set to find the results they are looking for.

Alternative approach

Address biasing at index time. Use a content feed and specify pagerank of individual documents. The pagerank attribute allows you to specify the pagerank of a document manually. This can be set as high as 99 for a very high pagerank. The default for all content fed documents was 96.

Project task overview

The following table lists the project tasks and activities for relevancy testing.

Task Activities
Plan test
  • Develop a list of queries to be executed by each user as part of testing
  • Identify a set of “relevant” documents pertaining to each query and user
  • Develop a relevancy scale to gauge quality search results
Execute test
  • Instruct users to execute tests and tally results/feedback
Iterate and re-test
  • Refine and repeat until satisfied with results

Long term enhancement

Develop a process and mechanism for gathering user feedback and continuing refinement of search relevancy in production.

Was this helpful?
How can we improve it?