Security

Designing Security in the GSA

Overview


Enterprise search projects integrate data from different sources to enable users to find information easily. In most cases, especially in intranet projects, access to documents in source applications is protected. To provide relevant and secure results to users, the corporate search engine must apply the same authorization policies as the sources where documents are stored.

The search appliance acts as a hub, where content coming from different sources is indexed to facilitate access to the information. The search appliance must rely on the same security protocols as those that the applications use. If your enterprise search project includes indexing protected content, you need to invest time during the design phase to model the security relationships between your content sources and the Google Search Appliance.

Before actual implementation of security in the GSA begins, take time to understand the overall integration scenario and reference architecture. Because there are probably internal security policies and protocols already established in your organization, you have to explore the best options for implementing security in the search environment. And you have to design a security model for the search appliance that will be consistent with all project phases.

This chapter explains the key processes of GSA's secure search, and how you should approach the overall design.

We can divide the secure search into three distinct but related processes:

Secure Content Acquisition The mechanism GSA uses to acquire the secured content source. GSA has to pass through the protection that the content source puts in place in order to gain access. It's an integral part of content acquisition, but it has to be considered as part of the security design.
Serve Time Authentication The mechanism used by GSA to identify end users. It could be one or more of the Internet authentication protocols. Note that this is the communication between GSA and the client (browser).
Serve Time Authorization The process that GSA employs to check whether a search user has access to the search results.

Content Acquisition

Once you have modeled the information about your content sources, you can design the authentication mechanism(s) the GSA will use to integrate with each secure source. This is the process portion of the project design phase that models the integration between the search appliance and an organization's systems. The search appliance permits the use of several authentication mechanisms simultaneously to accommodate different applications when acquiring contents. The process generally involves using a system or super user account with broad access to the content source so that all the documents can be indexed by the GSA.

Serve time authentication

Serve time authentication is the integration between the search appliance and the end user. It can be the same authentication protocol as used by one of the content sources, but it doesn't have to be. Sometimes multiple authentication protocols are required in order to support the authorization of different content sources. However, you should always ask yourself the following questions:

  • What authentication protocols are available in the customer's environment?
  • How can I minimize the authentication mechanisms used during serving? Can I reduce it to just one?
  • How can I minimize the impact on end users? Can authentication be silent?

Serve time authorization

Each content source uses its own security policies and infrastructure to authorize access to its information. Based on the information you gathered about the content sources, you select the authorization mechanisms based on answers to the following questions:

  • What authorization mechanisms are possible for the given content source?
  • Which mechanism gives the best performance?
  • What needs to be implemented for the content acquisition process in order to support this mechanism?

Although serve time authentication happens before authorization during serving, you should evaluate the authorization options FIRST. What is required for authorization generally decides what authentication mechanisms you should consider. In any case, these three processes are interrelated and you have to consider the implications of every decision.

Information Gathering


Google recommends that you take the following actions during initial analysis:

  • Clarify all requirements related to security, even potential future needs that are not currently part of the scope of the project, but might be considered for a future phase.
  • If one of the requirements is silent authentication, make sure that it is feasible to provide it before committing to it.
  • Identify the security mechanisms certified by your organization. Is there a single sign-on (SSO) system? Is Kerberos enabled?

Use the following table to model each content source. Include information about security in the Security Mechanisms field.

System Info Name of the system, and underlying product name
Description More description
Content Type For example, are they office documents, web pages? database records?
Content Size Document count—smaller content size could mean that late binding for serve time authorization might be OK
Serve time authentication Is the content server using Windows Integrated Authentication?
  Is the content server integrated with an SSO system?
  If the content server has its own user directory in either database or LDAP, are the user names synced with the company wide directory?
Serve time authorization How responsive is the content server? If it's not responsive, late binding is probably out of the question.
  Are the permissions on the documents fairly open, or very restrictive? If very restrictive, later binding is probably out of the question.
  Is there an API that can tell whether a user has access to a list of documents given the user name and document IDs? If there is such an API, connector or SAML authorization will be possible
  Is there a way to find out what groups/roles/users have access to each document? If the answer is "Yes," then ACL authorization is most likely the preferred approach.

Content Acquisition


The acquisition generally comes in the following forms. Note that the authentication protocol used would have to be what's supported by the content source. However, the content acquisition would usually allow different authorization mechanisms to be used.

  Possible authorization mechanisms for serving Notes
Direct Crawl ACL, Head Request, SAML Authorization It may involve developing a custom proxy server for extra processing
Feeds ACL, Head Request, SAML Authorization A simple, custom one-off implementation
Connectors ACL, Connector Authorization It could either be an off the shelf connector, or a custom connector to be developed

Single vs. Multiple identities


By examining all the content sources, you should be able to answer a very important question: Is one (default) Credential Group enough? This determines the model for serve time authentication. When there are multiple identities per user, you probably need to define multiple Credential Groups. Different authentication protocols for different content sources do not mean multiple identities. For example, one content source could be using a forms based authentication while another uses Kerberos. However, if the same Active Directory is used as the user directory for both systems, there is only one identity per user. Only if user information is stored in different repositories, there might be multiple identities needed by GSA. Still, there are two exceptions:

  • If one user directory is a replicate of another, there is still one identity per user. Hence one Credential Group is enough. For example, when Documentum is integrated with Active Directory, one approach is for all the users to be replicated in Documentum database.
  • If there is a strict matching of user names from two user directories, one Credential Group is enough as long as you put in place a user identity translation service—maybe as part of the authorization process.

Selecting an authorization mechanism


Serve time authentication and authorization are tightly connected. As mentioned previously, although serve time authentication happens before authorization during serving, you should evaluate the authorization options FIRST. This is a very important point worth repeating here. This chapter describes the connections between these two processes in details.

Authorization is always considered on a per content source basis. The purpose of authorization is to make sure that users can see what they are entitled to see in the search results. Besides this ultimate goal, the most important criteria in selecting which authorization mechanism to use is performance. It implies that:

  • Search results need to come back as fast as possible to give the end users the best experience possible. Based on usability studies, if search is too slow, many people would simply give up and usage of search would decrease.
  • Performance needs to be good enough so that relevant results will not be missing due to time outs. If the authorization decision times out on certain results, the results will have an indeterminate authorization decision, thus won't be displayed in the search results list.
  • When late binding authorization is used, you need to minimize the performance impact on the content server.

For deployment projects, if there is an existing connector provided by either Google or one of Google's partners, the authorization is already decided for you by the design of the connector. You have to select an authorization mechanism only under these circumstances:

  • There are multiple connectors offered by different parties and they use different authorization mechanism. There will be many factors in deciding which connector to use including costs, and authorization mechanism is only one of them.
  • A connector sometimes supports multiple authorization mechanisms. For example, the Google Search Appliance Connector for SharePoint supports three mechanisms: Per-URL ACL, Connector, and Head Requests.
  • When there is no existing connector, you have to develop custom code to integrate the secure content. This is when you have to consider all options.

Below we discuss the authorization in the order of performance preference. GSA processes authorization based on two main approaches:

Generally speaking, early binding speeds up the authorization process in the GSA compared to late binding, but it doesn't necessarily mean early binding should be the method always used for all content sources.

Early binding authorization

With early binding, authorization is fully managed by the search appliance itself. Early binding requires authorization rules to be known to GSA. It doesn't have to contact an external security component such as the content source at serve time to validate whether the user has the right to access a document.

The GSA supports the following two types of ACLs:

Per-URL ACLs

With Per-URL ACLs, each document in the index can have its own authorization rules. Adding a Per-URL ACL to a document can be done through Feeds, metadata in HTML body, or custom HTTP headers. Per-URL ACLs can include both users and groups. Per-URL ACL is generally preferred since it is much more scalable with the number of documents and offers better performance.

Considerations for using Per-URL ACLs:

  • This approach is very useful when you have fine-grained authorization rules and you want to have quick authorization responses. Fast authorization with ACLs is critical for such GSA features as Dynamic Navigation, duplicate directory filtering, and Dynamic Result Clusters.
  • This approach introduces some complexity into resolving group membership in the search appliance. This resolution can be managed by the GSA in some instances, for example, whether those groups are in an LDAP directory as Active Directory. You can also create your own custom processes to pass groups to the search appliance. Starting 7.4, an onboard Groups Database is introduced as a feature that offers even tighter integration.
  • There is also a delay between when a security setting is changed in the source platform and when the search appliance is notified of this.
  • The maximum number of principals that can be attached to a document is configurable, with a default of 10,000. The maximum is 100,000.
    • The following worst-case scenario has been tested with good performance (sub-second) in regards to ACL filtering:
      • 10k URLs to be filtered
      • Each URL has 10k items in the ACL
      • Search user belongs to 1k groups, but doesn't have access to any documents so the GSA has to exhaustively filter every URL that matches the search term.

Policy ACLs

A policy ACL focuses on protecting URL patterns rather than individual URLs. For this reason, it can group many documents behind it. You can configure policy ACLs based on URL patterns by using the GSA Admin Console, as well as the Policy ACL API. Use Policy ACLs when the number of authorization rules is low and a unique authorization rule can group multiple URLs.

Although not as commonly used as Per-URL ACLs, it is a very flexible tool that can come in handy in unique situation. For example, if there is a globally defined group that should be denied access to an easily identifiable content source, defining a single Policy ACL entry could be the option. Another case is when the content system uses coarse grained permission rules. For example, CA SiteMinder allows the definition of access control based on URL patterns. Those rules can be easily translated to Policy ACLs.

Late binding authorization

With late binding, the search appliance doesn't have authorization information for secure content (that is, ACLs) itself. Before the GSA returns search results to the user, it has to check security by contacting a third-party component to validate if the user is able to read each protected document that is part of the results. In response, the third-party component returns the authorization decision to the search appliance. The third-party component could either be the content source itself or an authorization server that centralizes that decision.

The GSA supports the following late binding approaches:

Connectors

Google provides some connectors, including SharePoint and Documentum that can be used in your projects to integrate the search appliance with third-party sources, being fully supported by Google. They run on a connector framework created by Google that you can also use to create your own connectors. The main advantage of using this platform to create your own connectors is that it provides a tight integration from configuration and indexing, to security with the search appliance.

Connector framework provides the SPI interface for the authorization to be implemented by any connector. The interface works in batch mode (multiple documents in one call) so that it provides answers without too many round trips. There are also other Google partner-provided connectors that are based on the framework and use this approach.

SAML Authorization

SAML is an XML-based framework for communicating user authentication, entitlement, and attribute information. It is a standard that can be used for authentication, but optionally, it can also be used for authorization. Authorization SPI explains how SAML can be used for authorization. Using SAML for authorization doesn't require using it for authentication, and vice versa, as they both are totally independent. In this case, the search appliance sends SAML authorization requests in XML format to the external service you have configured, and that server responds with the authorization permissions for each document.

Off-the-shelf SAML authentication products (IDPs) are quite common, but authorization service providers are not so common. SAML Bridge provides such functionality for using Kerberos impersonation to authorize using batched Head Requests. That is considered a legacy feature from a time when connector and ACL authorization were not available. This means that this approach will most likely be a custom project developed by you instead of using an existing product.

SAML authorizations can be managed in batches, so that the search appliance can send a list of URLs for authorization per request, which can speed up the process. You can activate this option in the GSA Admin Console, but your SAML authorization provider has to support it.

Head requests

Finally, it's also possible to send an HTTP head request to the content source to validate authorizations. The GSA can send an HTTP request using the document URL and read the HTTP response from the source to determine authorization based on the HTTP error codes:

  • 200: This code basically means that the user is able to access the document, so the search appliance would consider it as permission. It's also possible to define some exclusion rules in the search appliance, as there are some content sources that include 200 HTTP error codes, including a no access permitted message, as in the case of some web portal solutions.
  • Any other error code means the user is not able to access that particular document.

To verify the permission for all the results, one Head Request is sent per document sequentially until enough permitted documents are found to fill up at least one search results page. That's why a Head request is the worst performing authorization mechanism. It is generally used when there is no way to extract the ACL, or to verify permissions using an API.

Connector 4.x

The Connectors Release 4.x is a new connector framework based on a completely different architecture from previous releases. The security features it provides also work differently from previous releases. Here are some key differences for security:

  • A connector can be built to provide authentication and authorization. The communication protocol between the appliance and the connector is no longer proprietary XML. Instead, SAML is used as the underlying message exchange mechanism. An example of this connector is the Google Authentication Adaptor, which provides Authentication to Google ID. The way to configure it is the same as configuring a SAML provider.
  • A connector built on the 4.x framework supports Per-URL-ACL.
  • Connectors can be built to provide group resolution through SAML authentication. However, the preferred group resolution to use as of GSA 7.4 or 7.6 is onboard groups resolution. Group resolution through SAML is part of SAML authentication, unlike the previous connector framework where connectors can solely perform group resolution while authentication is performed by another mechanism.

Selecting an authentication mechanism


There are usually several authentication mechanisms at your disposal for a deployment. As stated in chapter one, the main goal is to use as few authentication mechanisms as possible. Quite often there is also an additional requirement: silent authentication. Not all authentication mechanisms can work with all authorization mechanisms. We can categorize all authorization mechanisms into two types:

  • User ID is required
  • User ID is not required

All authorization mechanisms require User ID except Head Requests. The following table lists authentication mechanisms that would result in a User ID:

  Authentication mechanisms when user ID is required
HTTP Basic/NTLM It is listed as HTTP Basic/NTLM. However, these are the authentication protocols used to verify the user credentials which happens between GSA and a back end server. To the end user, it is forms authentication. After the user credentials are verified by the configured "Sample URL", the User ID entered by the user is treated as the verified ID.
Client Certificate Certificate's DN is passed as the verified ID.
Kerberos The Windows user ID extracted from the Kerberos ticket is used as the verified user ID.
SAML AuthN The "Subject" passed by the SAML IDP is the verified ID.
LDAP Authentication User ID verified by the LDAP server is used as the verified ID.
Forms Authn with Cookie Cracking User ID is passed back to the GSA by the Cookie Cracker. This involves some coding where a simple dynamic web page needs to be implemented to pass this.
Connectors Connector Framework provides the authentication SPI which returns a trusted User ID. However, it must be implemented by the connector, which is optional. Not every connector available provides authentication. Connector implementers can even choose to require a password.

All the authentication mechanisms above can be mixed with ACLs, Connector and SAML authorizations. You can pick the one that fits customer requirements and is easiest to implement (also take into consideration any silent authentication requirements).

For Head Request authorization, you cannot pick just any possible authentication mechanism as the head requests are sent from the GSA to the content source, not by the client's browser to the GSA. Depending on the authentication protocol used by the content source, different credentials must be obtained by the GSA during the user authentication process.

  Authentication Mechanism when user ID is not required (Head Requests)
Cookie This is the most common situation; the search appliance forwards user cookies to validate access rights. Forms authentication is required. Cookie Cracking is not needed as a user ID is not required. The rule is configured under Universal Login Authentication Mechanism > Cookie.
HTTP An HTTP Basic/NTLM rule must be configured. The communication between the end user browser and the search appliance is in fact Forms authentication instead of using HTTP Basic authentication protocol. The rule is configured under Universal Login Authentication Mechanism > HTTP.
Kerberos An HTTP Basic/NTLM rule must be configured. The communication between the end user browser and the search appliance is in fact Forms authentication. The rule is configured under Universal Login Authentication Mechanism > Kerberos.

Mapping authentication to authorization


Most of the time, you only need to select one authentication mechanism to verify users. You can configure multiple authentication mechanisms, but what are the implications, and why would you need them? Here are some rules to remember:

  • When there are multiple Credential Groups, you will need to select one mechanism for each. They can be of the same type, for example, two forms authentications or two SAML authentications. Or they can be different:; one is forms authentication, and another is Kerberos.
  • You can also configure multiple authentication mechanisms for the same Credential Group. This is a less common usage. The exception is when a connector is used for group resolution. In this case, one authentication mechanism (most likely silent) verifies the search user, and the connector resolves group membership for ACL authorization. We will discuss more in later chapters.
  • All the authentication mechanisms will be fired. When two authentication mechanisms are defined for one Credential Group, the second will be fired up even if the first rule has been satisfied with a verified user ID.

To map authentication to authorization, the search appliance uses a feature called "flexible authorization", which is similar to a routing table for authorization mechanisms. It allows the administrator to configure the authorization process for documents per URL pattern as they see fit for their deployment. Flexible Authorization is managed through configuring authorization rules. A rule would consist of the following: the content to which the rule applies (defined by URL pattern), an identity that maps the rule to a credential rule or authentication mechanism, and other information specific to the authorization mechanism.

Most of the time, you don't have to make changes to the Flexible Authorization settings. The default settings would work. It's a routing table where you can mix and match authentication to authorization, but there are clear rules on what rules can or cannot be used together:

  • Per-URL ACL
    • The ACLs are part of the index that can not be added or removed on the fly. If URLs don't have ACLs attached, Per-URL ACL can't be used as a mechanism for those URLs. The Credential Group associated with the ACLs is also determined during index time which cannot be changed in the Flexible Authorization settings.
    • When the Per-URL ACL rule is defined as the first rule, the authorization happens in the index when matching results are identified. This gives Per-URL ACL much better performance than other authorizations. That's also why it's listed even before Cache authorization by default.
    • If you define a specific URL pattern instead of "/", or move the rule down the list below other authorization rules, it will be the Security Manager that performs the authorization. This means the performance will be worse as Per-URL ACLs would then be evaluated out of the index.
  • Connector
    • For content to be authorized using Connector authorization (3.x and lower , the URLs must start with "googleconnector://".

After the GSA has authenticated a user through a configured authentication mechanism, authorization to documents will be applied in order of their definition in the flexible authorization table for the particular URL pattern of the document. If more than one authorization mechanism applies to the document, the GSA will cycle through all the rules that apply, in order, until one of them returns a status of PERMIT or DENY. For example, if a connector sends in documents with ACLs, the Per-URL ACL rule will be evaluated first. If PERMIT or DENY is returned, that's the final result. However, if INDETERMINATE is returned, the "Connector" rule will be used to evaluate the documents.

Summary


In this chapter, we have reviewed the process of designing security for your enterprise search project with the Google Search Appliance. This requires a solid understanding of security in your organization, as well as the related content sources that will be part of the project. Here is a summary of the process in designing the solution:

  • Spend time up front to analyze the content sources: How will the content sources be acquired, what authentication is used, etc.
  • Determine how many Credential Groups will be needed.
  • Determine the preferred authorization mechanism for each content source.
  • Determine the minimum set of authentication mechanisms needed.
    • When possible, support silent authentication.
    • When possible, use supported, out of the box components.
    • Will these authentication mechanisms support the relevant content source authorization?
  • Configure Universal Login Auth Mechanisms.
  • Make changes to the Flexible Authorization Rules when necessary.
Was this helpful?
How can we improve it?