How the Google Search Appliance uses Forms to authenticate users and to authorize users to see content

Scope

This document applies to the Google Search Appliance version 5.2, and 6.0 with the following settings in admin console:

  1. Under "Crawl and Index > Forms Authentication", one or more rules are configured. No other authentication method is used.
  2. Under "Serving > Forms Authentication", "Login against a sample protected URL" is selected, and a sample URL is provided.
  3. The Google Search Appliance is in the same cookie domain as the content and login servers.

Introduction

This document describes the interaction between the Google Search Appliance, the search client (a web application or a search user's web browser) , and the content and login servers during secure searches against documents that are protected by Forms Authentication. Forms Authentication, or form based authentication is a technique widely used by SSO (single sign-on) systems. Username and password are collected via an HTML form. Once authenticated, the server sets one or more HTTP cookies in the browser to be used as credentials in subsequent communication. We refer to these cookies as SSO cookies in the rest of this document. Details of HTTP cookie are specified in RFC 2965.

The Google Search Appliance keeps a session for each secure search user. A session is a memory object on the appliance that stores some user information, including SSO cookies. The Google Search Appliance associates a secure search query with the user's session with a cookie named GSA_SESSION_ID. The GSA_SESSION_ID cookie is set without an expiration time, which means the browser normally deletes it on exit. Unless specified otherwise, a "session" refers to a session with the Google Search Appliance in this document.

Overview of a secure search

When the Google Search Appliance receives a secure search query, it authenticates the search user, gets a list of the documents matching the query, and authorizes every secure URL in the search results.

If a search user is already signed in to the SSO system or has a valid session that contains valid SSO cookies, the authentication process is seamless to the user. The Google Search Appliance returns search results that the user is authorized to view without prompting for username or password. Otherwise, the Google Search Appliance prompts for username and password before searching its index. The exact sequence of events depends on the state of the user's session with the Google Search Appliance and whether the user is signed in to the SSO system.

If the secure search query comes in with a GSA_SESSION_ID cookie, the Google Search Appliance verifies the validity of the corresponding session. A session is in one of the three possible states: (1) it does not exist on the appliance, which indicates that the session is expired; (2) it exists but stores expired SSO cookies; (3) it is valid, which means that it exists and contains valid SSO cookies. If a user's session exists, the Google Search Appliance tests it by fetching the sample URL (configured in "Serving > Forms Authentication") with the cookies stored in it. If the content server doesn't redirect the appliance to the login page, authentication is successful and the user's session is kept. Otherwise, the user's session is deleted and a new session will be created.

The Google Search Appliance determines whether the user is signed in to the SSO system by fetching the sample URL (configured in "Serving > Forms Authentication") with the cookies in the HTTP request of the search query. The Google Search Appliance considers the user as signed in if the content server doesn't redirect it to the login page.

The search operation is conducted inside the Google Search Appliance and is not discussed in this document.

The Google Search Appliance checks whether the user is authorized to view every secure URL in the search results. The URLs that the user is not authorized to access are removed before the search results are returned.

Step-by-step diagram and descriptions of various scenarios

The following search scenarios are discussed in details:

  1. The user has an existing session that stores expired SSO cookies and is not signed in
  2. The user does not have an existing session and is not signed in
  3. The user does not have an existing session but is signed in
  4. The user has an valid session

The user has an existing session that stores expired SSO cookies and is not signed in

This search scenario is not very common. We discuss it first because all the potential steps take place. The steps in other search scenarios are a subset of this scenario. In this scenario, the HTTP request of the client contains a GSA_SESSION_ID and the corresponding session exists. However, the session stores expired cookies. The browser doesn't have valid SSO cookies either.

 

  1. The user submits a secure search to the Google Search Appliance. A search query is secure if it contains access=a or access=s For example:
    https://gsa.company.com/search?q=test&access=a&rest_of_the_search_query
    
     

    Steps 2-5 take place only if the secure search query comes in with a GSA_SESSION_ID cookie and the corresponding session exists on the appliance. In these 4 steps, the Google Search Appliance tests whether the SSO cookies stored in the session are still valid.

  2. The Google Search Appliance fetches the sample URL specified in admin console under "Serving > Forms Authentication" with the cookies stored in the session.
  3. If the content server returns a 200 OK, the Google Search Appliance determines that the cookies stored in the session are still valid. Steps 4-25 will be skipped. If the content server returns a 302 response, the Google Search Appliance saves the redirect URL. At this point, the reason of the redirect is not clear. It is possible that the cookies are expired. It is also possible that the URL is indeed moved temporarily.
  4. The Google Search Appliance fetches the sample URL again without any cookies.
  5. The HTTP response should be a 302 redirect to the login page. If the redirect URL is different from what was saved in step 3, the Google Search Appliance determines that the redirect URL in step 3 is not the login page, thus the cookies used in step 2 are valid. It skips to step 26. If the redirect URL is the same as what was saved in step 3, the Google Search Appliance determines that the cookies stored in the session are expired. The session is invalidated.

    In steps 6-9, the Google Search Appliance tests the user's cookies to see if they are still valid.

    Back to diagram

  6. The Google Search Appliance parses the "Cookie" header in the HTTP GET request of the search query to extract cookies. It then fetches the sample URL specified in the admin console under "Serving > Forms Authentication" with the cookies.
  7. If the content server returns a 200 OK, the Google Search Appliance assumes that the user is signed in to the SSO system. Steps 8-25 are skipped. The cookies captured in the previous step are stored on the appliance in the user's session. If a session doesn't exist, a new session is created. If the content server returns a 302, the Google Search Appliance saves the redirect URL. At this point, the reason of the redirect is not clear. It's possible that the cookies are expired. It's also possible that the URL is indeed moved temporarily.
  8. The Google Search Appliance fetches the sample URL again without any cookies.
  9. The HTTP response should be a 302 redirect to the login page. If the redirect URL is different from what was saved in step 7, the Google Search Appliance determines that the redirect URL in step 7 is not the login page, thus the cookies used in step 6 are valid. It skips to step 26. If the redirect URL is the same as what was saved in step 7, the Google Search Appliance determines that the user is not signed in to the SSO system. The Google Search Appliance will prompt for username and password.

    In steps 10-21, the Google Search Appliance collects the user's credentials and use them to log in. Upon successful login, the Google Search Appliance expects to receive one or more SSO cookies. It stores the SSO cookies in the user's session. Depending on the login server configuration, it may take more than 4 steps to get the login form or to authenticate. The Google Search Appliance follows up to 10 redirects. The redirect types it follows include, the HTTP 3xx redirects, the Refresh header in the HTTP 200 response, and the Refresh meta tag in the HTTP 200 response.

    Back to diagram

  10.  The Google Search Appliance sends a 302 response to redirect the browser to an SSO login page on the Google Search Appliance. A typical response looks like:

    Note: A GSA_SESSION_ID cookie is set. The target Location is always in HTTPS protocol.

    HTTP/1.x 302 Found
    Connection: Close
    Set-Cookie: GSA_SESSION_ID=ac7564b00ei9684dei870ec31f9cd39c
    Location: https://gsa.company.com:443/ssoLogin?%2Fsearch%3Fq%3Dtest%26rest_of_the_search_query
    Content-Type: text/html
    Content-Length: 0
    
  11. The browser follows the 302 redirect.
  12. The Google Search Appliance fetches the sample URL, which is configured in "Login against a sample protected URL" under "Serving > Forms Authentication".
  13. The content server redirects the Google Search Appliance to the login page.
  14. The Google Search Appliance fetches the login page.
  15. The login page is returned.
  16. Before the Google Search Appliance passes the login page to the browser, it modifies the "action" attribute of the form to post to itself and adds a few hidden fields.
  17. The browser sends a POST request to submit the login form to the Google Search Appliance.

    Note: In steps 18-21, the Google Search Appliance authenticates using the search user's username/password and saves the session cookies set by the login server.

    Back to diagram

  18. The Google Search Appliance takes the data in the posted form, then sends an HTTP POST request to submit the form to the login server. The appliance does not check whether the authentication is successful in this step.
  19. Upon successful authentication, the login server sets one or more SSO cookies and redirects the Google Search Appliance to the target URL. The response of a successful authentication normally looks like:
    HTTP/1.1 302 Found
    Date: Tue, 25 Aug 2009 18:43:03 GMT
    Server: Apache/2.2....
    Location: http://sample_url
    Set-Cookie: SMSESSION=FAWEFAEWFAWF....
    Content-Type: text/html
    
  20. The Google Search Appliance saves the SSO cookies, follows the redirect to fetches the URL.
  21. If the content server returns a 200 response, the authentication steps end here. If the content server returns a 3xx response, the Google Search Appliance follows the redirect (step 20) up to 10 times.
  22. The Google Search Appliance redirects the browser to the original search query. The Google Search Appliance also sets the SSO cookies acquired in step 18-21 to the browser.
    HTTP/1.x 302 Found
    Connection: Close
    Set-Cookie: SMSESSION=FAWEFAEWFAWF...
    Location: https://gsa.company.com:443/search?q=test&rest_of_the_search_query
    Content-Type: text/html
    
  23.  The browser follows the redirect and submits the query again. Both the GSA_SESSION_ID and the SSO cookies set in the previous step should be presented.
    GET /search?q=sso&access=a&rest_of_the_search_query
    ...
    Cookie: GSA_SESSION_ID=ac7564b00ei9684dei870ec31f9cd39c; SMSESSION=FAWEFAEWFAWF...
    ...
    
  24. (Same as step 2) The Google Search Appliance fetches the sample URL specified in admin console under "Serving > Forms Authentication" with the cookies stored in the session.
  25. If content server returns a 200 OK, the Google Search Appliance determines that the user is authenticated. Otherwise, the appliance behaves the same way as step 3 and proceeds to step 4.

    Note: Step 26 & 27 are authorization check steps. The Google Search Appliance searches its index and prepares a search result set. By default, a search result set contains 10 URLs. The number of results in a search result set can be altered by the "num" parameter in the search query. If there are secure URLs in the search result set, the Google Search Appliance checks authorization for each secure URL by sending a GET request with the SSO cookies stored in the user's session. To save bandwidth, the "Range" header is set to "bytes=0-0" so that the content server doesn't have to return the content. The Google Search Appliance tries to prepare a result set that includes the requested number of URLs. If an URL is not authorized, it finds the next most relevant URL.

    Back to diagram

  26. This is the authorization check request for URLs that are protected by Forms Authentication. It normally looks like:
    GET /search_result_url HTTP/1.0
    Host: content_server_hostname
    Connection: Keep-Alive
    User-Agent: gsa-crawler
    Cookie: SMSESSION=FAWEFAEWFAWF...; SOME_OTHER_COOKIE=...
    Range: bytes=0-0
    
  27. This is the authorization check response. If the HTTP response code is 2xx, the corresponding URL is included in the search result set. If the HTTP response code is 3xx or 4xx, the corresponding URL is removed from the search result set.
  28. The search result set is returned to the user. A GSA_SESSION_ID cookie is set in this step if the user did not have a valid session with the Google Search Appliance but already signed in to the content servers.

The user does not have an existing session and is not signed in

The user doesn't have a valid session if (1) the HTTP request of the search query doesn't contain a GSA_SESSION_ID cookie; or (2) the session specified by the GSA_SESSION_ID is expired (doesn't exists). The Google Search Appliance deletes a session if it is inactive for more than 30 minutes. If the user doesn't have a valid session and is not already signed in, the Google Search Appliance prompts for username and password. Since the user doesn't have a valid session, steps 2-5 in the diagram above are skipped. The Google Search Appliance also creates a new session for the user and sets a GSA_SESSION_ID cookie ( step 10). It then logs in to the content server with the credentials it has acquired. If the Google Search Appliance logs in successfully, it stores the cookies that are set by the login server in the session. If the Google Search Appliance fails to login with user's credentials, it prompts the user again.

The call flow diagram is the same as in The user has an existing session that stores expired SSO cookies and is not signed in EXCEPT that steps 2-5 are skipped. Steps that take place are 1 and 6-28.

The user does not have an existing session but is signed in

If the GSA_SESSION_ID cookie is expired or the HTTP request of the secure search doesn't contain a GSA_SESSION_ID cookie, the Google Search Appliance tries to verify whether the user is signed in to the SSO system (steps 2 & 3). If the user is signed in, the Google Search Appliance starts searching, checks whether the user is authorized to see each result, then returns search results.

Note: It is possible that a search user is prompted by the Google Search Appliance for username and password even if the user is already signed in to the SSO system. For example, if the SSO cookies are IP restricted, the Google Search Appliance would fail to fetch the sample URL in step 2. In situations like this, the authentication and authorization steps are the same as those in section The user does not have an existing session and is not signed in .

 

The user has an valid session

If the user has a valid session with the GSA, the Google Search Appliance tries to verify whether the SSO cookies stored in the session are still valid (step 2 & 3). If the SSO cookies are valid, the Google Search Appliance starts searching and authorizing search results. The description of step 23-28 of section "The user has an existing session that stores expired SSO cookies and is not signed in " applies.

 


 
Was this helpful?
How can we improve it?