Deployment Scenario Handbook

Internal Search over intranet, File System, and SharePoint

Scenario overview


Acme Inc. houses different corpora that are being served up on different servers on their corporate network. These data silos are accessed by way of different data management applications, such as SharePoint, as well as secure files shares.

Having to go to different applications to find information has become tedious and very time consuming for their employees. Not only that, the loss in productivity trying to locate a particular piece of information has started to show up on their bottom line because of the repetitive searching between disjointed systems to search for information and ineffective existing search tools.

Requirements


  • Index the following content, while keeping it secure:
    • Secure file shares
    • SharePoint portal data used to host internal sites
  • Present search results for secure content only to users authorized to see the content.
  • Create a standard UI for data access.
  • Create custom interfaces for internal and external users.
  • Deployment must result in a measurable business benefit.

Assumptions


  • There are more than 500K documents in SharePoint.
  • An automated analytics solution is desirable.

Key considerations


  • Decide whether to use the onboard or offboard Google Search Appliance Connector for SharePoint.
  • Decide whether to crawl web-enabled SMB file shares or use the file system connector.
  • Decide whether to present results directly from the GSA or by means of a web application presentation layer.
  • Decide whether to manage security by using the search appliance or by means of a fronting application
  • Decide whether silent authentication is needed, where users are not re-prompted by the GSA for credentials.

Recommended approach


Google’s recommended approach for implementing internal search over intranet, file system, and SharePoint covers the following areas:

Benefit analysis

To gauge the business benefit of the resulting search solution, Acme Inc. will conduct a short study to capture time spent on existing platforms. Automated tools should be used to gather this information whenever possible. If there are any analytics tools in place, they should be used to gather information about the usage of search or the time it takes to find information on the current systems. If no analytics are in place, Acme Inc. should consider implementing an analytics solution for automated evaluation of effectiveness in the future.

After the deployment has concluded, Acme Inc. will conduct an evaluation of the new solution to gauge its effectiveness. To recognize the right metrics, they will compare similar use cases evaluated before beginning the deployment.

Deployment architecture

Acme Inc. will deploy the offboard SharePoint connector, as the total SharePoint document count is over 500K. If file shares can be web-enabled, then they can be directly crawled by the GSA.

Results will be presented directly from the GSA by using customized front ends for different data stores. In the case of searching SharePoint, the Search Box for SharePoint will be deployed and used.

Consider utilizing the Google Search Appliance Connector for File Systems to index file shares. Some scenarios where the connector should be used include:

  • Authorization by early binding (ACLs)
  • Need to maintain last access dates on files and directories that are being traversed
  • The share is a non-HTTP exposed Windows DFS domain root share

Crawl and index configuration

Acme Inc. will configure crawl and index for the following types of content sources:

  • SharePoint—To index content on SharePoint, Acme will install and configure the SharePoint adaptor on a separate server. ACLs, as well as users and group resolution, will be managed by the adaptor.
  • File Shares—To index file shares, Acme will configure the web-enabled file shares on the Content Sources > Web Crawl > Start and Block URLs page in the GSA Admin Console.

Serve-Time authentication and authorization configuration

Acme Inc. will use Kerberos as the preferred authentication mechanism between the GSA and the content server. They will make this work by performing the following tasks:

Since content feeds containing ACLs will be submitted from SharePoint to the GSA, content will be authorized in the GSA’s index utilizing the ACLs that were fed at crawl time.

Alternative approaches

  • Use the Google Search Appliance Connector for File Systems to index the file share content.
    • The advantage to this approach is that ACLs would be fed in along with the content, enabling an early binding Authorization decision, which is better performing.
    • The correct groups for the user would need to be resolved at Authentication time for this approach to work as groups are needed for the early binding ACL authorization trim. The ADGroups connector, also needed for SharePoint could be used for this purpose.

Project task overview


The following table lists the project tasks and activities for implementing internal search over intranet, file system, and SharePoint.

Task Activities
Plan deployment architecture  
Configure crawl and index
  • Configure ADGroups connector
  • Configure File Share locations in Crawl and Index in the GSA Admin Console
Configure front end  
Configure serve time authentication/authorization
  • Configure Connector Based Authorization
  • Configure group resolution mechanism
  • If the SharePoint connector is being used, this will most likely be SharePoint connector-based Authentication configured for group resolution only.

Long term enhancement


Deploy the Google Search Box for SharePoint in order to serve search results from within SharePoint.

Was this helpful?
How can we improve it?