Introducing the Connectors
Connector software version 3.0
Connector Manager version 3.0
Installer version 3.0.2
This document is for Google Search Appliance administrators who want to set up and manage connectors. Use this book as the starting point for the complete set of connector documentation, which includes these related documents:
- Google Search Appliance help topics. These pages describe configuration controls available in the Connector Administration pages of the Admin Console.
- Connector configuration documentation. These documents contain important product-specific information for each connector type.
The rest of this book describes how connectors work and how to deploy them in your environment.
- How Connectors Work: Crawling, Traversal, and Feeds
- Connector Components
Connectors enable the Google Search Appliance to search and serve documents stored in non-Web repositories such as enterprise content management (ECM) systems. Connectors are either available pre-installed on the Google Search Appliance, or installed on a host running Apache Tomcat. A Google Search Appliance with configured connectors can perform fast, unified, secure search across multiple systems and document repositories.
Users can find the information they need across your enterprise without needing to know how the information was created or where it is stored. Suddenly there is a way to connect information silos. It is no longer necessary to migrate all of your data to a single system or repository so that it is findable by everyone. This brings your enterprise together and breaks down technical barriers to sharing knowledge and resources across your organization.
This section provides some basic information on connector support and an overview of how connectors work with the Google Search Appliance.
Google provides the connector manager and connectors in two ways:
- You can obtain an installer package on the Google Cloud Support Portal that deploys Apache Tomcat, a connector manager, and a particular connector type. This is the recommended way to install connectors. Google supports the installer and the software packaged with the installer. Details of how to login to this site are available at: How do I access the Google Cloud Support Portal? Or, you can start here.
- You can obtain the source code for the connector manager and connectors at the Google Search Appliance Connector Framework project on Google Search Appliance.
The open-source software is for the development of third-party connectors. Developers using the resources provided in this project can create connectors for virtually any type of document-based repository. Google does not support the open-source software or changes you make to the open-source software.
Connectors enable indexing and query-time connections between a Google Search Appliance and a specialized type of repository, for example, SharePoint, Lotus Notes, or Documentum.
Connectors have two major functions:
- Collection: A connector instance traverses its associated document repository and feeds document data to the Google Search Appliance for indexing. The queries are issued in the native API (applications programming interface) of the content management system.
- Presentation: At query time, connectors forward authentication credentials and authorization requests to the repository.
To locate documents on a web site or file system to add to the search index, the Google Search Appliance uses a process called crawl or crawling. The crawl process issues http requests or follows links to locate content on a web site or file system.
When connecting to a document repository through a connector, the Google Search Appliance uses a process called traversal. Traversal is a process in which the connector issues queries to the repository to retrieve document data to feed to the Google Search Appliance for indexing.
Connectors use either content feeds or metadata-and-URL feeds to send information from the repository to the search appliance. The SharePoint connector version 3.0 can be configured to use either a content feed or a metadata-and-URL feed.
To begin generating the initial index of repository content, the connector manager starts a connector instance, which traverses the repository on a defined schedule. The connector manager formats the content and any associated metadata for a feed to the Google Search Appliance, which then creates an index of the documents. The following diagram shows these events in sequence:
- The administrator uses the Admin Console to add a connector, define the traversal schedule, and set other parameters.
- The connector manager starts the connector instance on the schedule defined for the instance.
- The connector instance traverses the repository.
- The connector manager formats the documents and data for a feed to the Google Search Appliance.
- The feeds application programming interface (API) processes the document data.
- The Google Search Appliance indexes the documents and metadata.
Depending on how you schedule connectors, the process described above can be separated into several traversal operations taking place at non-peak hours.
For public content in a repository, searches work the same way as they do with web and file system content. The Google Search Appliance searches its index and returns relevant results to the user without any involvement by the connector.
To authorize access to private or protected content from a repository, the Google Search Appliance creates a connector instance at query time. The connector instance forwards authentication credentials to the repository for authorization checking. This diagram shows the event flow at a high level:
- The end user submits a query to the Google Search Appliance.
- The Google Search Appliance prompts the end user for authentication credentials.
- The end user enters credentials, which are forwarded through the Google Search Appliance and connector manager to the repository.
- The repository checks the user’s credentials.
- The Google Search Appliance searches the index for relevant results. If the search results include protected documents, the connector instance contacts the repository to perform an authorization check.
- The repository performs an authorization check and restricts the result set to the documents to which the end user has access.
- The end user views a page of the restricted results. The URL displayed depends on the connector instance configuration. Typically, the URL opens a repository summary page for the document.
Query-time behavior varies depending on the connector type. For more information, see the documentation for each connector type.
The connector manager recognizes identities passed from basic authentication, SAML authentication, and client certificates. If a SAML authentication provider is setup to support single sign-on (SSO), the connector manager also recognizes identities passed from the SSO provider. You can also use the security manager for user authentication.
For many Google Search Appliance connectors for release, authorization and authentication have been enhanced. Please refer to the connector documentation.
The connector framework consists of these components:
- Connector manager, a module that runs and monitors connectors. The connector manager runs in an external servlet container.
- Connector or connector type, a Java archive (JAR) file installed in a particular connector manager. The connector type contains resources and information for creating connections to a particular type of repository for indexing and query time operations.
- Connector instance, a connection between a Google Search Appliance and a repository, instantiated by the connector manager using the connector type.
Connector types contain information and resources for configuring connections to the particular content management system. This information determines which repository-specific configuration options are displayed in the Admin Console page for adding a connector.
This diagram depicts a connector manager creating connector instances for two separate repositories:
Note that at present only one connector type per connector manager is supported.
Connector managers create connector instances using the resources in the connector definition and the configuration values that you define on the Admin Console. Connector managers provide the runtime environment for running and monitoring connectors for different repositories. A particular connector manager can communicate with only one Google Search Appliance. A Google Search Appliance can receive feeds from multiple connector managers.
Connectors run on connector managers residing on servlet containers installed on computers on your network. All Google-supported connectors are certified on Apache Tomcat 6.0.18. Because the connector manager conforms to widely accepted standards for web applications, you may be able to successfully deploy the connector manager on other servlet container products.
A connector manager can run multiple connector instances. For best results, run no more than ten connector instances per connector manager If you are using the security manager connector authentication mechanism, only one connector instance per connector manager is supported. For all other authentication mechanisms, including Kerberos, LDAP, and connector authentication, multiple connector instances per connector manager are supported.
After the connector manager is deployed, use the Google Search Appliance Admin Console to register the connector manager by defining its name and location. You can then install connectors and create connector instances.
Multiple connector managers can be registered on a single Google Search Appliance.
Connector types are JAR files that contain the resources required to create an instance of a particular kind of connector. The resources are used by the connector manager to create the connector instances that you define in the Admin Console. Connectors must be installed on a connector manager deployed on a servlet container.
A connector instance is the running data connection between a Google Search Appliance and a repository. To traverse a repository for information to index, create a connector instance that runs until all documents in the repository are indexed. You can configure connector instances to start and stop running at predefined intervals.
A particular connector instance can access only one repository and is known to only one connector manager. Do not reuse connector names across different connector manager instances that communicate with the same Google Search Appliance. Connector names must also be unique on a particular search appliance.
If you are using the security manager connector authentication mechanism, only one connector instance per connector manager is supported. For all other authentication mechanisms, including Kerberos, LDAP, and connector authentication, multiple connector instances per connector manager are supported.
The connector manager does not attempt to run a connector instance until the first time that the connector is scheduled to run. Subsequently, the connector instance runs according to the schedule you define in the Admin Console.
Each connector instance added to a particular connector manager or Google Search Appliance must have a unique name, and you cannot re-use connector names across different connector manager instances that communicate with the same Google Search Appliance. The connector name must consist of no more than 64 alphanumeric characters. All alphabetical characters must be lower-case. Connector names may include underscores (_) and hyphens (-), but they cannot begin with a hyphen.
You can install only one connector manager on a particular Apache Tomcat instance. If you need to run multiple connectors on a host, Google recommends that you run the installer multiple times to create each required connector. If you are installing manually, install a Tomcat instance for each connector manager and install each Tomcat instance to run as a different user. Ensure that each connector instance has a unique name. See Connector Instances for more information on connector names.