Deployment Scenario Handbook

Basic Internal Search

Scenario overview


Acme Inc. has a large internal web presence that extends out to different parts of the globe. In the use case for this scenario, they want to consolidate the searching of all their internal websites and pages in one place so their employees will not have to go to different websites to search for information. Although all users can access the Acme Inc. intranet, not all of them have access to all the information on the various sites in their corporate domain. For example, Human Resources information access is desirable through search, thus securing personal information is an important requirement.

Requirements


  • Index the following content:
    • Corporate files shares
    • Internal web pages
    • HR information
  • Provide a general search box, which would return a results page across all indexed content, including content across all product units.
  • Provide specific search boxes, which would return results specific to a particular product unit.
  • Style the search box and result page according to Acme Inc. corporate branding standards.
  • Present search results for secure content only to users authorized to see the content.
  • Provide failover capability in case of a GSA outage/issue.

Assumption


There is a mechanism in place to authenticate a user.

Key considerations


  • Decide whether to present results directly from the search appliance or by means of a web application presentation layer.
  • Decide whether to manage security by using the search appliance or by means of the application fronting the GSA.
  • Decide how to configure the Google Search Appliance Connector for File Systems to index file share content.
  • Use reporting or analytics to gauge user interaction with search materials.

Recommended approach


Google’s recommended approach for implementing basic internal search covers the following areas:

Deployment architecture

To account for failover capabilities, Acme Inc. will use a total of two GSAs in a production configuration. The two GSAs will be used as active-passive configured search appliances with one primary appliance and a hot backup for failover. Acme Inc. will configure both search appliances for mirroring, with one GSA acting as the primary search appliance, on which all configuration changes should be made. To achieve an active-passive configuration, they will deploy a load balancer in front of the GSAs. The role of the load balancer will be to ping the active GSA, failing over to the hot backup unit in case of a failed expected response from the active unit.

Because the GSA is being deployed internally, serving results right off the GSA is recommended, styling them by using the GSA stylesheet. In this case, Acme Inc. can modify the stylesheet by using the Page Layout Helper, an XSLT wizard on the GSA, to add certain features to the display quickly. In case of additional desired modifications, they can manually modify the stylesheet to make changes. Take note that Google Support does not support any custom XSLT modifications.

A reverse proxy is needed in the architecture if query fidelity is required to ensure search query parameters are not tampered with or cannot be submitted ad-hoc to the GSA. If secure content is marked in the index as “public,” with security being applied by an application layer based on metadata, a reverse proxy should be used to front the GSAs and filter search queries so no one can access it directly to submit their own queries. This is needed to ensure the URL is not manipulated by a user to see items that contain metadata they are not allowed to see or come from a collection they wouldn’t be entitled to see.

The Google Search Appliance Connector for File Systems, used to index file share content, should be hosted on an external server in a Production environment. The connector runs in a JVM and comes built in with Tomcat.

Crawl and index configuration

Acme Inc. will configure start URLs for top-level pages. For distinguishing content based on Acme Inc.’s departments, collections can be established for each department.

The Google Search Appliance Connector for File Systems should be used to index file shares. The connector supports:

  • Authorization by early binding (ACLs)
  • Need to maintain last access dates on files and directories that are being traversed
  • The share is a non-HTTP exposed Windows DFS domain root share

Secure search configuration

Acme can use one of the following strategies to secure content, depending on whether authorization is required or not:

Only authentication is required

If authentication is required, but not authorization:

  • Crawl content with an admin account and mark the content as “public.”
  • Place this crawled content into a collection.
  • The application tier above the GSA handles authentication. Once a user has been authenticated to a page with a search box on it, a search is executed on the collection where the content was placed.
  • As this strategy will mix public and secure content, if there is a desire to restrict certain users from seeing secure content, use a reverse proxy in front of the GSA to make sure proper queries are sent to the GSA. The reverse proxy will ensure the GSA is not directly exposed to unauthenticated users, where users can build their own search parameters onto queries.

Take note that a reverse proxy will add another component to the architecture. For more information, see Implementing a Reverse Proxy for Perimeter Security and Other Reasons.

Authentication and authorization are required

If both authentication and authorization are required:

  • Crawl content with the account of a user who has access and do not mark the content as “public.”
  • Users may need to submit their credentials upon executing a search and results would be authorized against service end-points using HEAD request checks.
  • Determine how to integrate with the authentication mechanism that is available. The possibilities include:
    • Kerberos (for more information, see the Kerberos scenario described in Silent Authentication)
    • Integrated Windows Authentication NTLM by utilizing the SAML Bridge
    • LDAP or Basic prompt for username/password by the GSA
    • Cookie translation to integrate with forms authentication and provide a verified username back to the GSA

Front end configuration

Each search box deployed on the web properties will have a set of query parameters tied to it. These parameters will be sent down with the query to the GSA to shape the type of results that appear in the search results page.

For example, a search box deployed on the HR department page should pass down the collection parameters for that specific department. Google recommends that Acme Inc. style the results by using the Page Layout Helper. This way, certain features can be turned on or off. Another advantage of using this XSLT wizard is the increased chance of compatibility with future versions of the XSLT.

Administrative items

Acme Inc. will use the Advanced Search Reporting feature to create reports about what users are searching for and what they are clicking in search results pages. These reports should be generated and analyzed frequently, as they are a good indicator of general search satisfaction.

Alternative approaches


For the secure search configuration “Only authentication is required,” instead of using an application fronting the GSA to perform authentication, use the Perimeter Security feature on the GSA, which ensures that the search appliance doesn't serve any results without user authentication. When perimeter security is enabled, the search appliance must authenticate a user with one of the configured authentication mechanisms before serving any results. If authentication fails, the GSA will not serve any results, even if they are public.

Use policy ACLs for content that only authenticated users can access. With this approach, an “everyone” group can be used to govern access to this content. This approach will require the “everyone” group resolution at authentication time.

Project task overview


The following table lists the project tasks and activities for implementing basic internal search.

Task Activities
Plan deployment architecture
  • Configure search appliances and setup mirroring
  • Configure load balancer in front of the GSAs
  • Set up perimeter security around GSAs
  • Procure server to host File System Connector
Configure crawl and index
  • Configure collections identified for departments
  • Install and Configure File System Connector
  • Identify security mechanisms and configure crawler for access
Configure front end
  • Configure XSLT modifications per front end using the onboard wizard

Long term enhancements


  • Tweak search and features based on reports showing user search patterns.
  • Identify content for KeyMatches.
  • Enable more complex synonym lists.
  • Enable Entity Recognition to automatically enrich documents with metadata using
  • Enable Dynamic Navigation for metadata-driven facet navigation.
  • Enable Expert Search for office and/or department listings.
  • Identify areas for which OneBoxes can be of value.
Was this helpful?
How can we improve it?