Security

Using Out of Box Features

In this chapter, we will look at the details of some of the authentication and authorization mechanisms. We will also discuss common scenarios that are supported by Google Search Appliance and related products offered by Google. We will focus on scenarios that don't require writing code.

Silent authentication


IT security aims to protect applications and data, providing accurate information to users, but in a secure manner. But It's also important for access control mechanisms to have a minimal impact on users. For example, if a user has already been authenticated by a trusted component, applications should rely on that process to avoid prompting the user multiple times for their credentials or to verify their identity. This is the concept behind silent authentication—verifying a user's identity on the GSA without prompting or requiring them to go through an additional login process.

Silent authentication can be implemented for a search service as for any other application in an organization. There are different authentication mechanisms that enable you to provide a silent authentication experience, including protocols, such as Kerberos or NTLM, or corporate applications such as an SSO system.

Before you implement silent authentication for your search environment, answer the following questions:

  • What are the silent authentication options within your organization? Is there a preferred option?
  • In the case where there is more than one silent authentication option (for instance, forms based and Kerberos), are they managing the different user identities and credentials needed for the authorization? You have to understand if just using one of them would be sufficient or you need to use both. Also consider if one can assert the identity of the other.
  • Are there any multiple authentication domains? For instance, different Windows domains for Kerberos. This information is also important for modeling the authorization process.
  • Which applications or content sources that you have to integrate with the search engine are also using the silent authentication mechanism? You might be able to leverage it.

The search appliance can be integrated out-of-the-box with the following silent authentication protocols/systems:

Forms or cookie-based authentication

Forms or cookie-based authentication is the process driven by a session cookie, typically from a single sign-on system. This could potentially be silent if the user has already been authenticated before reaching the search application. If not, the user would be prompted for credentials to create the proper session cookies that provide the SSO experience. The Cookie Authentication Scenarios section in GSA documentation provides technical details about how to integrate with a SSO system. If it's also required to pass a user ID to the search appliance, you have to implement a cookie-cracking process.

Kerberos

The Kerberos protocol is used by default in Windows networks. The search appliance can be configured to enable Kerberos so that the authentication is transparent to users.

SAML

Many SSO systems support the SAML protocol, and provide a silent authentication process. Note that SAML protocol is a way for an eternal service to securely assert the user's identity to GSA. The actual authentication between the user and this service is still going to be standard authentication protocols such as Kerberos, NTLM, or Cookie-based. It is rare that you have to write a SAML identity provider (called a SAML IdP) from scratch. It's far more common to integrate GSA with a SAML IdP already deployed in the customer's network.

Client Certificates

This is not a common scenario. However, in those environments where users do have client certificates, it's also possible to configure the search appliance to authenticate users through X.509 certificates, which can also provide silent authentication to users.

SAML


The search appliance supports the integration with SAML, a security standard that enables you to create ad-hoc authentication processes off the search engine. If you build a SAML authentication provider, you can code in the authentication logic you might need. If the user is authenticated properly by this external process, the user identity is passed back to the search appliance.

Because SAML is a security standard, it is supported by some commercial and open source authentication solutions and some SSO systems provide a SAML interface. Check whether your organization's authentication solutions already provide such an authentication interface to facilitate the integration with the search appliance. If so, it might not be necessary to develop such a service.

Consider that it's also possible to configure a SAML authorization process as described in Authentication for Developers, but this is independent from whether SAML authentication is configured or not.

You can refer to the GSA product documentation to learn how to set up SAML in the search appliance.

Early binding with Per-URL ACL


When utilizing ACLs for authorization, note that all components that make up an ACL must match the resolved identity for the ACL check to pass: domain, user principal, group principals, namespaces for group and user principals, case sensitivity specified, and ACL type (Permit/Deny).

Group Resolution

Unlike any other authorization mechanism, there is an additional step for ACL authorization: Group Resolution for a verified user ID. The concept of group resolution is very important in the context of early binding ACL support in the GSA. Because a user can be a member of different groups in an identity management system, the same modeling of identity needs to be provided on the GSA. After authentication, the GSA stores the user ID along with the groups the user is member of. There are five options to resolve groups:
  • Groups database (beta). Starting from release 7.2, the search appliance includes an internal database that stores ACLs. This is still a beta feature that has limited functions and scalability. Group memberships must be fed to the appliance, similar to how documents can be fed to the appliance's index.
  • Connectors. The Connector Framework provides an interface to resolve groups. It's up to the connector developer to decide whether Per-URL ACL or group resolution is implemented. Of all the connectors that Google supports, the SharePoint connector, Active Directory Groups connector, and Documentum connector provide such a feature.
  • LDAP. LDAP authentication can resolve nested LDAP groups. It is not recommended for use with Active Directory (use Active Directory Groups connector instead), but it can be used for other LDAP servers.

The three options above can be SOLELY used to resolve groups when authentication is performed by another mechanism. The following two options will resolve groups as part of the authentication process—they cannot be used for group resolution alone.

  • Cookie Cracking. Groups can be returned in a custom header, together with the user ID. It has to be part of the cookie authentication process.
  • SAML. Groups can be returned as part of the SAML authentication process. It has to be part of the SAML authentication process.

These two mechanisms are generally used at deployments that require custom development. The next chapter contains more in-depth documentation on this.

Namespace

The GSA supports ACL namespacing. The concept of namespacing was introduced in order to avoid name clashes of users and groups from multiple sources in the index. Here is an example:

User John Smith has two identities, and we've set up two credential groups, jsmith in CG1, and johns in CG2. In the index, all ACLs pertaining to John Smith could be associated with either of the two identities. There must be a way to distinguish them. That's why namespace is introduced.

If the principal scope is user, the namespace is equivalent to Credential Group. In ACLs, the principal must be either:

jsmith in namespace CG1 
or
johns in namespace CG2

However, if the principal scope is group, namespace doesn't have to be the same as the credential group of the user. As long as the namespace of the resolved groups matches what is defined in ACL in the index, the permission check will work. Here is an example:

John Smith's first identity, jsmith, is from the company-wide Active Directory. Of course, there are AD Groups that jsmith is a member of. Let's say one of the content sources is Plone, which is integrated with Active Directory, but has its own groups defined. How do we avoid conflicts when there are groups with the same names in both Active Directory and Plone? Groups from Active Directory will have namespace CG1. We can give groups from Plone a different namespace such as plone_space. The ACLs in the index will have the following entries:


<principal namespace="CG1" scope="user" access="permit">jsmith</principal>
…
<principal namespace="CG1" scope="group" access="permit">authors</principal>
…
<principal namespace="plone_space" scope="group" access="deny">authors</principal>

As long as the right groups can be resolved for jsmith during group resolution after authentication, the right permissions will be applied:

CG1:jsmith belongs to groups:
CG1:authors, plone_spce:authors

Domain parsing

Domain names are pretty common for user credentials and groups. The search appliance has a separate field for domain when the principal is stored for the following cases:

  • After the user is authenticated, the resolved verified ID and associated groups contain both username and domain name.
  • The principal on document ACLs for both users and groups contain the principal name and domain name.

From different authentication protocols, verified users can take different formats:

  • bob@google.com
  • google\bob

The search appliance parses these formats consistently and extracts the domain name and username during authentication and ACL indexing. From the 2 example above, google would be extracted as the domain.

Late binding for ACLs

When using ACLs to govern access to documents in the GSA, you might want to configure a late binding fallback in case the ACLs in the index are not fully in synch with the content source due to timing issues. When the late binding fallback feature for Flexible Authorization is enabled, the GSA will only accept a DENY response for the POLICY and Per-URL ACL mechanisms. For PERMIT and INDETERMINATE, the GSA will apply subsequent rules until one of them returns a decision other than INDETERMINATE. If none do, the result will not be presented to the user.

Connectors using Per-URL ACL


Local Namespace

The Connector Framework introduced the concept of "Local Namespace." Note that this is a connector concept. For ACL definition, there is only one namespace attribute. In connector configuration, there are two namespace fields: one is "Global namespace", which is equivalent to the Credential Group in Authentication. The other is "Local namespace", which will be the name of the connector (or the name of another configured connector, selectable in the dropdown).

Let's use the previous example of the Plone content source. If a Plone connector is built based on the connector framework, with the instance name of "plone_connector", here are what the ACL principals look like in feeds sent by the connector:


<principal namespace="CG1" scope="user" access="permit">jsmith</principal>
...
<principal namespace="CG2" scope="user" access="permit">johns</principal>
...
<principal namespace="CG1" scope="group" access="permit">authors</principal>
...
<principal namespace="CG1_plone_connector" scope="group" access="deny">authors</principal>
...

The search appliance concatenates the "Global namespace" and the "Local namespace" in the connector's configuration as the "namespace" attribute in ACL sent via feeds.

Avoiding domain parsing

As the previous section described, the appliance will try to interpret the principal format and extract the domain out of it. However, there is only one exception: When ACLs are sent in via feeds, if "unqualified" is set for the attribute principal_type on a principal, the domain will not be parsed and the name will be treated as a literal no matter what format it takes. This attribute and behavior is designed as another option to avoid group name conflicts—mainly as a hack to keep the SharePoint connector backward compatible. SharePoint allows you to define groups at different levels of a hierarchical web site structure. If we are to use the "Local namespace" feature of the connector, there will be one namespace per site. GSA's Connector for SharePoint prefixes all SharePoint local groups with the site URL which the groups belong to, and sets the principal_type to "unqualified." The search appliance will store these groups as they are passed in so that there won't be any conflicts of the same group name from different sites. Here is an example of SharePoint local groups being sent to GSA in feeds:


<principal principal-type="unqualified" namespace="Default_sp" case-sensitivity-type="everything-case-insensitive" scope="group" access="permit">[http://w2k8r2entsp1]Home Owners</principal>

On the other hand, if an AD group is sent, it will look like the following:


<principal namespace="Default" case-sensitivity-type="everything-case-insensitive" scope="group" access="permit">mydomain\Home Owners</principal>

Connector 4.0 (beta)


Working with Per-URL ACL

The indexing of ACLs by Connector 4.0 differs from that of previous versions:

  • ACLs are not sent in via feeds. Instead, they are indexed as HTTP headers.
  • If ACLs are hierarchical, they won't be flattened. Inheritance will always be used.
  • Namespace needs to be handled by each connector. The File system connector and SharePoint connector use the name "adaptor.namespace" as the configuration entry.
  • There is no more Local Namespace concept—you are free to specify any namespace. The ACLs from this connector will all use the same namespace. Except in the following scenario:
    • principal-type is no longer used by connector 4.0. The scope of SharePoint groups will be appended to namespace, and the principals will be sent without the prefix. For example, "My SP Group" group within http://sharepointhost/sitecollection/ will be processed by the SharePoint connector as follows (assuming the Credential Group is "Default"):

      Namespace: Default_http://sharepointhost/sitecollection/
      Principal name: My SP Group

      If the principal has a domain such as mydomain\mygroup, it will be processed as follows:

      Namespace: Default
      Principal name: mygroup

      Domain: mydomain

Authentication

As discussed in Chapter 1, connector authentication uses the SAML protocol. The connector framework 4.0 provides SAML as the foundation for security. Connectors based on the new framework must provide its own implementation of the authentication process for the targeted content source. Here is how you configure it in the Admin Console: Under Search > Secure Search > Universal Login Auth Mechanisms > SAML, you need to enter the following values:

IDP Entity ID: The server.samlEntityId configuration entry from the connector configuration file.
Login URL: https://connector-host-name:port/samlip
Public Key: <public key of the IdP>

Here are some notes about the SAML implementation by the connector:

  • You can have multiple connectors providing authentication. The Entity IDs will be different.
  • Only Post Binding is supported.
  • The Endpoint of the SAML IdP "samlip" is hardcoded
  • Groups can be returned as part of the SAML assertion in the "member-of" attribute.

Authorization

The "Authorization" in this section refers to late binding when using connector 4.0. In order to configure this, you need to perform the following: In Admin Console, under Search > Secure Search > Flexible Authorization, the Authorization service URL needs to be set to: https://connector-host-name:port/saml-authz

Security in Windows environments


The majority of deployments of the appliance, which incorporate secure content search, occur in a Microsoft Windows environment. Google provides two accompanying products for the integration: the SAML Bridge and the Active Directory Groups Connector.

SAML Bridge

The search appliance directly supports Kerberos authentication in Windows without the need for installing any external components to the GSA. Since Kerberos is supported in all Windows environments, it is the recommended mechanism for silent authentication. However, it might not be sufficient for the following reasons:

  1. Kerberos is quite sensitive to the environment. For example, a client device might not support Kerberos; some network scenarios might not support Kerberos. In those cases, native Windows clients fallback to NTLM authentication. However, the search appliance does not natively support NTLM so there is nothing to fall back to.
  2. Some organizations do not allow the use of key tab files for Kerberos. GSA uses a key tab file in order to enable Kerberos.
  3. When the GSA is Kerberos enabled and used for Head Request authorization, it can only perform unconstrained delegation. This is not acceptable for some organizations.

If you want to enable silent authentication when Kerberos cannot be used (or the key tab file cannot be used), you must set up an external authentication process. Google provides an open sourced tool called the SAML Bridge that supports these scenarios. This is a SAML-based solution that runs on the Windows infrastructure, so it must be installed on a separate host, able to authenticate users using NTLM or Kerberos. For detailed information about how to set up the SAML bridge, see Enabling Windows Integrated Authentication.

Active Directory Groups Connector

In a Windows environment, many content sources are integrated with Active Directory. Groups in Active Directory are used to control access to certain resources. The Google Search Appliance Connector for Active Directory Groups is a tool that can be used to support early binding. It is the preferred approach to resolving groups needed for early binding, versus LDAP authentication, which can be configured directly on the GSA. Although LDAP authentication can also be used for Active Directory groups resolution, it is "late binding" when it resolves groups, in that during the authentication process, the appliance will try to contact domain controllers directly to get associated groups for a user. Alternatively, the Active Directory Groups connector performs the "early binding" of groups resolution: It traverses Active Directory and stores all the user group membership information in its own database. During serve time, the connector just reads from this database instead of contacting domain controllers directly. It offers much better performance—especially in a large scale, multiple domain environment.

Here are some unique behaviors and deployment best practices:

  • The connector will run for a long time—it could be days if the Active Directory has a lot of users and groups. It's recommended to:
    • Use dedicated AD Groups connector instances. This is true even for the SharePoint connector which has an embedded Active Directory Groups connector capability and can index both SharePoint content and Active Directory groups.
    • Increase the traversal time out. There are six stages to complete the traversal. You can see that in the logs. If you see repeated "update 1/6" and ""update 2/6", but it nevers goes beyond that, it's a sign that the traversal thread was interrupted before it could finish. You can increase the time by changing the variable traversal.time.limit in INSTALLROOT/INSTANCENAME/Tomcat/webapps/connector-manager/WEB-INF/applicationContext.properties
  • Make sure you are binding directly to a non-load balanced Domain Controller Host to take advantage of incremental AD traversal.
    • The connector uses checkpointing that is unique to a specific Domain Controller, so in order to take advantage of the checkpointed updates, you must continue to connect to the same unique Domain Controller upon every request.
  • Always use an offboard connector, and one connector instance per connector manager.
    • Easier to patch and troubleshoot
    • More scalable since you can control resource consumption easily.
  • Use an external database to store the group information.
    • It is more reliable for production than using the embedded database.
    • As the embedded database is tied to a Connector Manager instance, it is also the only way to correctly resolve groups when multiple combinations of AD groups connectors and SharePoint connectors are used over multiple Connector Managers. For example, when there are multiple AD domains, there must be one connector for each domain. In order to resolve groups for users from different domains or if memberships cross domain, the groups information must be put in the same database and same tables. Since the database configuration is at the connector manager level, you have to configure these connector managers to use an external database in order for the multiple instances to share the same related data.

Perimeter security


Documents in the search appliance index can be labeled as either "public" or "secure." How a document is labeled depends on how the content was indexed, either by crawling or feeding, as well as the configuration information in the GSA. In terms of security, an indexed document falls into one of the following two categories:

Public document Secure document
  • Public crawled document
  • Feed document with no security
  • Content from a secure content source that has been marked as public by using the GSA Admin Console
  • Securely crawled document
  • Feed document declared as secure

Users can search and get to public documents without authentication. However, there is an exception. GSA 6.14 introduced the perimeter security feature to the GSA, which ensures that the search appliance doesn't serve any results without user authentication. When perimeter security is enabled, the search appliance must authenticate a user with one of the configured authentication mechanisms before serving any results. If authentication fails, the GSA will not serve any results, even if they are public. Take note that only authentication is performed for documents marked as public, without the need to do any authorization.

To configure perimeter security, set up an authentication mechanism, which can be any mechanism described in Designing Security in the GSA. After that is done, navigate to Serving -> Universal Login and enable perimeter security. Take note that once perimeter security is enabled, it applies to the GSA globally and cannot be configured per collection or front end.

Secure Search Example


Here are requirements for four content sources to be included in search (all secure):

  1. SharePoint 2010, with Kerberos authentication. Google supported connector for SharePoint is used to index the content.
  2. Salesforce content integrated with a SAML IdP which uses Forms authentication, but the user directory is still Active Directory. A Salesforce connector is deployed to index the content with ACLs. The connector is built based on Google's connector framework and sends in documents starting with "googleconnector://".
  3. A custom IIS web site with Kerberos Authentication. No API is available for checking permissions or getting ACLs. GSA will crawl the content directly.
  4. A legacy business application. Users and permissions are stored in the database. It's not integrated with Active Directory. There is no direct mapping of user names between Active Directory and this application. Google's database connector is used to index the content. A SQL Query statement can be used to determine whether a user has access to the database records in the search results.

It is also noted that there are various devices in the organization. Some don't support Kerberos.

User Identities

SharePoint, Salesforce and the custom IIS web site are backed by the same Active Directory, while the legacy application has its own. That means we need two Credential Groups: We can use the "Default" Credential Group for Active Directory, and add a "Legacy" Credential Group for the business application.

Authorization

When we try to come up with a solution, you need to start with authorization. It's obvious that we should use Per-URL ACL for SharePoint and Salesforce content. Because GSA's connector for Database supports authorization using a query, we can use connector authorization for this content. We will have to use Head Request for the custom IIS web site. Since it uses Kerberos, we can use Head Request with Kerberos. SharePoint, Salesforce and legacy applications need a verified user identity, while the custom IIS web site doesn't.

Authentication

Now that we have decided which authorization mechanisms to use, it's time to select authentication for each Credential Group. For the "Default" Credential Group, we cannot use Kerberos on the GSA because there are client devices that don't support Kerberos. It leaves us with the SAML Bridge as an alternative. Upon closer investigation, we might be able to use the already available SAML IdP used by the Salesforce integration. It will return the same verified identity, and it doesn't require an additional server to host the SAML Bridge.

Next, we have to validate whether this authentication strategy is sufficient for authorization requirements tied to the "Default" Credential Group. SharePoint and Salesforce content should be covered since we are getting a verified identity that will be used for the ACL checks. The custom IIS web site poses a challenge if we use the Salesforce SAML IdP with cookie authentication since no Kerberos ticket would be available for the Head Request Authorization. To fulfill this requirement, we can use the SAML Bridge for authorization only because it supports Kerberos delegation. It can perform batched Head Requests using Kerberos given a username. But this means we still need to deploy the SAML Bridge. If we need to deploy the SAML Bridge anyway, we can use it for Authentication as well.

For the "Legacy" Credential Group, we need to perform authentication against user credentials stored in the database. However, the GSA connector for Databases does not provide an authentication mechanism. In this case, customization is needed to implement the AuthenticationManager interface of the Connector Manager in the database connector.

As we now know what Authentication mechanisms we'll be using, in Universal Login Auth Mechanism, we configure the following two rules:

  1. SAML. Using the "Default" Credential Group, SAML Bridge should be configured in POST Binding mode. See the Wiki documentation for detailed instructions.
  2. Connector. Using the "Legacy" Credential Group, the customized Database connector should be configured.

Flexible Authorization Rules

In general, for most deployments, we can leave the first 3 entries of Flexible Authorization alone: PER_URL_ACL, CACHE, and POLICY. This also applies for this particular deployment. PER-URL-ACL rule will kick in for SharePoint and Salesforce content because ACLs are indexed with documents. We do have to make some changes to the CONNECTOR rule because the default configuration is only associated with the "Default" Credential Group.

  • CONNECTOR
    • Change the Authentication ID to "Legacy"—it's equivalent to the selection of Credential Group here.
    • Fill in the database connector name in Connector Name field.

We also need to define a SAML rule. Although SAML Bridge uses Head Request to authorize custom IIS web sites, we cannot rely on the "HEADREQUEST" rule because that's for GSA to perform Head Request.

  • SAML
    • It should be right after CONNECTOR rule in the Flexible Authorization order.
    • Authentication ID should be "Default" (maps to Credential Group).
    • Authorization Service URL should point to Saml Bridge's Authz.aspx.
Was this helpful?
How can we improve it?