- Per-URL ACLs
- SAML authorization
- Connector Framework for Authorization
- Connectors 4.x Authorization
- Web proxy
An enterprise search engine must return relevant results to the user, but only those that the user has access to. This is managed through the authorization process that applies to every secure document in the index. In this chapter we focus on custom solutions when designing the authorization process in your enterprise search project with Google.
The section Selecting an authorization mechanism introduced the following main options for building a custom authorization process:
The following sections provide more details on using these options in a custom solution.
The biggest challenge of using early binding in a custom connector or feeds is to simulate the authorization model of the target system. Every system's security model can be different.
There are a couple of ways to associate ACLs with documents, such as in HTML headers as metadata, or through custom HTTP headers. However, only feeds allow you to specify all the possible ACL attributes. Since the Google Connector Framework is based on feeds, this discussion covers the case when the ACLs are sent by a connector. See Specifying Per-URL ACLs for information on how to fully define the ACL. Among the features that GSA offers to simulate different security models, ACL inheritance is a very important one.
ACL inheritance makes it more efficient to deal with ACL changes. As ACLs no longer have to be expanded and attached to each level in a hierarchy, it makes it more efficient to deal with ACL changes, as you only have to re-index the level at which the permission changed.
The attribute "inheritance-type" makes it possible to model the different security mechanisms of various content systems. In an inheritance chain, the permission check always traces back to the top and permissions are evaluated according to the inheritance type that was set:
- The permission of the parent ACL dominates the child ACL, except when the parent permission is INDETERMINATE. In this case, the child permission dominates. If both parent and child are INDETERMINATE, then the permission is INDETERMINATE.
- The permission of the child ACL dominates the parent ACL, except when the child permission is INDETERMINATE. In this case, the parent permission dominates. If both parent and child are INDETERMINATE, then the permission is INDETERMINATE.
- The permission is PERMIT only if both the parent ACL and child ACL permissions are PERMIT. Otherwise, the permission is DENY.
Inheritance chain example
- "FileUrl" (USER:joe access:PERMIT type:LEAF) inherits
- "FolderUrl" (GROUP:eng access:PERMIT type:CHILD_OVERRIDES) inherits
- "ShareUrl" (GROUP:interns access:DENY type:PARENT_OVERRIDES
- PERMITs identity (USER:joe, GROUP:eng)
- PERMIT by FileUrl ACL, not overridden = PERMIT
- PERMITs identity (USER:moe, GROUP:eng)
- INDETERMINATE + PERMIT + not overridden = PERMIT
- DENYs (USER:adam, GROUP:eng, GROUP:interns)
- INDETERMINATE + PERMIT + DENY (override) = DENY
ACLs can be "Free" or "Bound." ACLs that are attached to indexed documents are "Bound". "Free" ACLs can represent non-document elements. For example, some content systems define permission objects which can be used by different documents. ACLs are maintained on these special objects instead of on documents. Content systems such as File systems have hierarchies and ACLs can be defined on folders which are not documents. "Free" ACLs can be used in both of these scenarios. They are not counted as indexed documents so they don't count against a GSA's license.
"Free" ACL example
<group> <acl url='http://dummyhost.corp.google.com/' inheritance-type="child-overrides" inherit-from='http://corp.google.com/'> <principal scope="user" access="permit">edward</principal> <principal scope="user" access="deny"> william</principal> <principal scope="user" access="deny"> ben </principal> <principal scope="group" access="permit">nobles</principal> <principal scope="group" access="deny">playwrights</principal> </acl> ... ... </group>
In this example,
http://dummyhost.corp.google.com/ is a free ACL, which inherits from
http://corp.google.com/ and defines further principals. Since the ACL is of inheritance type child-overrides, its child will override this ACL if any.
You can fully customize the authorization process through an external SAML provider that resolves authorization. It would be best to build such a SAML authorization process using the program language that you are most familiar with. The SAML Authorization request is an XML-formatted request that the search appliance sends to the service URL that you have configured in the Admin Console. That request contains information about the user and the URLs to be authorized. SAML also supports batch processes, so that multiple URLs can be sent at the same time, something that is very desirable to implement when using this approach for performance benefits in avoiding Authorization chattiness.
The Authentication/Authorization for Enterprise SPI Guide contains more information about the SAML XML format, which you can use to build a custom SAML authorization process. You have to implement the service that runs on an external application server that parses the response, extracts the information about whether the user has rights to access the document, and returns an XML-formatted response to the search appliance. An example is the SAML Bridge which can perform batch authorization of Kerberized content using Head Requests.
Considerations for using SAML authorization:
- The main advantage of implementing this authorization model is that you can fully control the security process at search time.
- The main inconvenience of this approach is that it is intrinsically related to the late binding method. That is, it might take more time to manage authorization, although batch processing can mitigate it.
Another option for modeling security is implementing a custom connector. As it's explained in this paper and GSA documentation, a connector can be created to "traverse" or feed public or secure content into the search appliance as well as to support serve time authentication and authorization. We have discussed connectors using Per-URL ACL. Here we will discuss using connectors to perform authorization as a late binding mechanism.
The Connector Framework defines the following interface to be implemented by a connector developer:
public interface AuthorizationManager;
It has the following method for authorization:
public List authorizeDocids(Collection docids, AuthenticationIdentity identity)throws RepositoryException;
"docids" is a collection of unique document IDs for matched search results. Multiple docids are passed from the appliance to a connector. When enough documents are authorized based on search user's identity, the appliance stops calling the connector. Otherwise, the appliance will keep calling this API—each time with more docids than previous call until either the allocated time runs out, docids run out, or enough documents with "PERMIT" are returned to the search user.
AuthenticationIdentity holds the verified identity of the user. Depending on the authentication protocol used, it can contain username, domain, or even password (if the authentication protocol deployed gathers password). A connector implementation should decide what minimum information in AuthenticationIdentity is required.
A connector only needs to provide implementation for the following interface:
public interface AuthzAuthority
and register it with:
The options described above are the most common platforms used to implement the security side of the interconnection with a content source. There are others, such as using a web proxy to manage the authorization.
In this case, the authorization is centralized in a web proxy that requires all URLs to be rewritten to go through it. So the search appliance sends HTTP head requests to validate security before serving results.
Using a web proxy is similar to using a SAML authorization provider, but with the following disadvantages:
- Authorization requests are not batched.
- URLs have to be rewritten to go through the web proxy for authorization. For example:
- Those URLs that were rewritten are stored in the index in this manner and may have to be translated again to the original URL in the search front end.