Configuring the Connector for Databases (Deprecated)

Connector software version 3.0
Connector Manager version 3.0
Installer version 3.0


This document contains the information you need to install the Google Search Appliance Connector for Databases and configure the Google Search Appliance and the connector to traverse, index, and search content in a relational database.

This document is for database administrators and administrators who install and configure the Google Search Appliance. If you are not familiar with the system that the connector will traverse and index, work closely with your system administrators to determine the correct values for installing and configuring the connector.

Use this document in conjunction with the following related documents:

The rest of this book describes how to configure the Google Search Appliance Connector for Databases.



Introducing the Google Search Appliance Connector for Databases

The Google Search Appliance Connector for Databases enables the Google Search Appliance to traverse, index, and search content and metadata stored in a relational database management system. The search appliance traverses the database tables you designate based on a SQL query that retrieves the appropriate content and metadata. You can optionally provide XSLT to customize how the results are displayed. The connector automatically detects changes to the content and metadata since the previous traversals and modifies the search index accordingly.

You can also use the database connector for external metadata indexing, which is useful when the metadata for your documents resides in a relational database rather than in the primary document.


Supported Databases

The Google Search Appliance Connector for Databases is supported on the following relational database management systems:

  • MySQL 5.2.3
  • Oracle Database 10g Enterprise Edition Release 10.2.0.4.0
  • Microsoft SQL Server 2005
  • IBM DB2/NT SQL09050

Supported Operating Systems

  • Windows 2003 Server Enterprise Edition SP2
  • Windows XP Professional version 2002 SP2
  • Red Hat Linux 5

The connector manager and the Google Search Appliance Connector for Databases are supported in virtualization environments. Google does not provide support for specific virtualization environments or for issues that are specific to virtualization.


Supported Java Version

The Google Search Appliance Connector for Databases requires a minimum of Java Runtime Environment 5 and may support JRE 6.


Apache Tomcat Version

The installer installs a connector manager, a connector type, and Apache Tomcat 6.0.18. Tomcat 5.5.23 is supported for this release of the connector and connector manager.


Before You Install the Connector

Before you install the Google Search Appliance Connector for Databases:

  1. Install the JDBC drive for your database and find out the location of the JDBC jar file.
  2. Determine the following values, which you need during the configuration process.
Name Description Values and Usage
User Name Required field. User name of the database user whose account is used to connect to the database. The user must have sufficient privileges to access all tables and fields, including remote access to the database. Windows authentication used in Microsoft SQL Server is not supported by the connector.
Password Required field. Database user's password. If the database user's password is blank, the field can be blank.
JDBC connection URL Required field. JDBC connection URL used for establishing a connection to the database. Ports used in the JDBC connection URLs above are the default Database installation TCP/IP ports. These may differ if user has custom installation. The database server must be accessible from the connector host when the connector is being configured. The connection URL must use this format:

protocol://db_server_IP_address:port/database

MySQL

jdbc:mysql://10.88.45.40:3306/MySQL

jdbc:mysql://myserver.com/DATABASENAME

Microsoft SQL Server 2005

jdbc:sqlserver://DB_HOST_NAME_OR_IP:1433;databaseName=Database Name

Oracle Database

jdbc:oracle:thin:@DB_HOST_NAME_OR_IP:1521:Database Name

IBM DB2

jdbc:db2://DB_HOST_NAME_OR_IP:50000/Database Name

If you are configuring IBM DB2 and you are using the BLOB/CLOB feature for external metadata indexing, the connection URL must have the following string appended:

:driverType=4;fullyMaterializeLobData=true;
fullyMaterializeInputStreams=true;progressiveStreaming=2;
progresssiveLocators=2;

The full connection string will look like this:

jdbc:db2://db2-server:port/testDB:driverType=4;
fullyMaterializeLobData=true;fullyMaterializeInputStreams=true;
progressiveStreaming=2;progresssiveLocators=2;

Database Name Required field. Database name. This is the name of the database schema in which the table used in select query resides. The value is used in XSLT, the display URL, and the XML representation of a row.
Connector Host Name Required field. Fully qualified host name. For example, hostname.domain. The value is used in the display URL in search results in the form DB_HOST_NAME_OR_IP.

Examples:

DB_HOST.example.com

172.16.254.1

JDBC Driver Class Name Required Field. Fully qualified driver class name without the .class extension.

The value is used to load the JDBC driver class.

For Example:

MySQL

com.mysql.jdbc.Driver

Microsoft SQL Server 2005

com.microsoft.sqlserver.jdbc.SQLServerDriver

Oracle Database

oracle.jdbc.OracleDriver

IBM DB2

com.ibm.db2.jcc.DB2Driver

SQL Query Required field. A valid SQL SELECT statement. The SQL query is executed to fetch records from the database during the traversal process. In the following example, the query retrieves data from a single table called employee with columns empid, first_name, last_name, manager, and deptno:

SELECT empid,first_name,last_name,manager,deptno FROM employee ORDER BY empid

The connector can traverse data from multiple tables using JOIN or UNION. For example, the following SQL query traverses data in the suppliers and orders tables:

SELECT suppliers.supplier_id, suppliers.supplier_name, orders.order_date FROM suppliers, orders WHERE suppliers.supplier_id = orders.supplier_id ORDER BY suppliers.supplier_id

Due to how the connector detects differences during traversal, the SQL Query should contain an "ORDER BY " clause on the configured primary key in ascending order. The primary key and the query have to match and without DESC (so by default it will use ascending order).

Primary Keys Required field. The primary key column name for the SQL query. If a table uses composite primary key, the columns making up the primary key must be provided as a comma-separated string. If the SQL query contains aliases for column names, use the aliases as the primary key values. The primary key field is not case sensitive. The value is used to build the document ID of a document.
Stylesheet for Serving Results Optional field. XSLT for customizing the appearance of search results. If a value is not provided, the default XSLT is used for serving results.
Last Modified Date Optional. Last date the record was modified. Use if the database table contains a column for the last date the record was modified. Enter the column name in this field.
AuthZ Query Query used by the connector to authorize user access to particular documents. For documents sent to the search appliance as a content feed only. Provide a valid SQL query containing placeholders for user names and document IDs. The query returns results as a comma-separated list of primary keys. For example:

select concatenate(primary_key1, ",", primary_key2, ",", primary_key3) from Table_Name where username=#username# and concatenate(primary_key1, ",", primary_key2, ",", primary_key3) IN ($docid$);


Stylesheet Example

You can use a custom XSLT stylesheet to define which fields are displayed as content and which as metadata in the search results. The following is an example of valid XSLT to use, where firstName, lastName, email, and id are column names, and DB_Name is the database name:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<html>
<head>
<xsl:for-each select="DB_Name">
<meta>
<xsl:attribute name="name">
<xsl:text>id</xsl:text>
</xsl:attribute>
<xsl:attribute name="content">
<xsl:text> <xsl:value-of select="id"/> </xsl:text>
</xsl:attribute>
</meta>
</xsl:for-each>
</head>
<body>
<xsl:for-each select="DB_Name">
<title><xsl:value-of select="title"/>
</title>
</xsl:for-each>
<table border="1">
<tr bgcolor="#9acd32">
<th>First Name</th>
<th>Last Name</th>
<th>Email</th>
<th>Id</th> </tr>
<xsl:for-each select="DB_Name">
<tr> <td>
<xsl:value-of select="firstName"/></td>
<td><xsl:value-of select="lastName"/></td>
<td><xsl:value-of select="email"/></td>
<td><xsl:value-of select="id"/></td> </tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>


External Metadata Indexing

Use the optional external metadata indexing feature when document metadata are stored in a database rather than in the primary document. After you index external metadata, it is searchable in the same way as other metadata.

Depending on how your primary document is stored and references, you complete the configuration fields differently. In each of the scenarios discussed below, a meta name for each column is the metadata key and the metadata value is the value in the column. The primary key columns are not considered for indexing.

Scenario 1

In this scenario, the primary document's URL is stored in a column in the database table. Other columns contain the medata associated with the primary document. The database connector queries the database, the submits a metadata-and-URL feed to the search appliance. The connector traverses the database records defined by the SQL query. The URL to the document is extracted from each record. The column names are metadata keys and the values in the columns are metadata values.

For this scenario, select the radio button for the Document URL Field and fill in the field with the field name containing the document URL.

Scenario 2

This scenario is similar to Scenario 1, but instead of retrieving the full primary document URL from the database, the document URL is constructed from a base URL that you provide plus the document ID that is stored in the table. For example, the following URL identifies a unique document:

http://my-host:6502/getdoc?action=get&docid=4662118437

The document ID 4662118437 resides in a database column. You select the Document ID Field radio button, provide the name of the column in the corresponding field, and type the base URL of http://my-host:6502/getdoc?action=get&docid= in the Base URL field.

Scenario 3

In this scenario, the external metadata is stored in a database and the primary document is also in the database, stored as a BLOB or CLOB datatype. To configure this, select the BLOB/CLOB Field radio button and enter the name of the BLOB or CLOB column name in the corresponding text field. If the database contains a column containing the URL for fetching the BLOB or CLOB data, enter the column name in the Fetch URL Field. The MIME type of the BLOB or CLOB data is determined programatically. When BLOB or CLOD content is used, a pseudo-display URL is used. Customize the search appliance stylesheet to change the display URL. If the database has a column for the fetch URL, that URL is used as the display URL.

External Metadata Indexing Configuration Fields

If you plan to use external metadata indexing, regardless of the scenario, determine the following values before you configure the connector. You must still determine the values in the table above, which are required for all database connectors.

Name Description Values and Usage
Document URL Optional field. Use when the database contains a column to record the absolute URL of the primary document. For a metadata and URL feed. If you select this option, you must also enter a valid URL pattern in the Include URLs field for Google Search Appliance crawling.
Document ID Optional field. Use when the database contains a column to record the document ID. Enter the name of the column that contains the document ID.
Base URL Required if you select the Document ID field. The absolute URL of the document is constructed by combining the base URL value with the document ID field.
BLOB/CLOB Option field. Database column name for a column containing a document stored in BLOB or CLOB data. Use when the database table has a column that contains the primary document stored as BLOB or CLOB data. The MIME type of the BLOB or CLOB data is determined programatically.
Fetch URL Optional field. If you select BLOB/CLOB, enter the column that contains the URL for retrieving the content of a primary BLOB or CLOB document. The value of the column is used as the display URL for the corresponding primary document.

Handling Authentication and Authorization in the Database Connector

The Database Connector does not handle user authentication. Instead, enable Kerberos/SAML or LDAP on the search appliance. The Database Connector does not support use of Security Manager for user authentication.

The Database Connector handles authorization only for documents sent to the search appliance in a content feed. If the Database Connector is configured for metadata-and-URL feed, using either a base URL and docId or a URL field, authorization is assumed to be handled by the security mechanism protecting the source.

For more information on authorization in the Database Connector, including examples of SQL query design, see the Database Connector wiki


Configuring Crawl and Feeds for the Connector

Before you install the Google Search Appliance Connector for Databases, you must make an addition to the Follow and Crawl URLs defined in the Admin Console. The Google Search Appliance rejects content in the repository without the addition.

To configure crawl and feeds for the connector:

  1. On the Admin Console, navigate to the Crawl and Index > Crawl URLs page.
  2. In the Follow and Only Crawl URLs with the Following Patterns box, add the following statement:

    ^googleconnector://

    For metadata-and-URL feeds, the following format is also supported:

    http://hostname:port/foo/bar.html

  3. Save the configuration.
  4. Click Crawl and index > Feeds.
  5. In the List of Trusted IP Addresses section, select Trust feeds from all IP addresses or Only trust feeds from these IP addresses.
  6. If you selected Only trust feeds from these IP addresses in step 5, type in the trusted IP addresses.
  7. Click Save Settings.

Installing the Google Search Appliance Connector for Databases

This section describes the installation process for the Google Search Appliance Connector for Databases. You install the connector using an installer that installs Apache Tomcat, a connector manager, and the connector on a host computer.

The instructions that follow are in two parts. In the first part, you download and uncompress the installer package. In the second, you install the software on the connector host.

To download and uncompress the installation package:

  1. Log in to the host using an account with sufficient privileges to install the software.
  2. Start a web browser.
  3. Navigate to the connector download site
  4. Download the correct software distribution package to the host where you are installing the software.
  5. Uncompress the package.
  6. If you are on Windows, skip step 7 and go to the instructions immediately below for installing Tomcat, a connector manager, and the connector.
  7. If you are on Linux, follow these instructions.
    1. Open a terminal window and go to the base directory of the GCI.bin file in the extracted folder.
    2. To run the installer in graphical mode, execute the following command:

      ./GCI.bin LAX_VM/java_location_to_java

      for example, ./GCI.bin LAX_VM /usr/java/j2sdk1.5.2_x/bin/java

    3. To run the installer in console mode, execute the command in Step 3 above with the -i console argument appended.
    4. Go to the following instructions and proceed from Step 2.

To install Apache Tomcat, a connector manager, and the Google Search Appliance Connector for Databases:

  1. Double-click the distribution file to start the installer.

    You see an introductory panel.

  2. Click Next.

    The Licence Agreement panel appears.

  3. Indicate whether you accept or decline the terms of the license and click Next:
    • To accept the license, click I accept the terms of the License Agreement.
    • To decline the terms, click I do NOT accept the terms of the License Agreement.
  4. On the Select Connector panel, select the correct connector and click Next.
  5. On the Install Connector panel, choose Install new Google Connector and click Next.
  6. On the Connector Configuration panel, enter the name you want to assign the connector and a port number that is not already used by another application.

    If you are creating multiple installations of the connector, ensure that you do not use consecutive port numbers. Each connector installation requires two consecutive port numbers for use by Tomcat. For example, if ConnectorInstall1 is installed on port 8080, do not use port 8081 for ConnectorInstall2. In addition, do not use the AJP Connector port (port 8009) listed in the Tomcat server.xml file. In installations where SSL is supported, do not use the SSL port.

  7. Enter the Google Search Appliance IP Address, which is the IP address to which the connector sends feeds.

    Entering the address ensures that only the search appliance can communicate with the connector manager.

  8. If you do not want the connector service to start automatically, uncheck the Start Database connector Service after Installation check box.
  9. If you do not want to register the connector manager on the search appliance during this installation process, uncheck the Register Connector Manager with GSA checkbox.
  10. Click Next.
  11. On the Choose Java Runtime Environment panel, choose the correct JRE for the connector to use and or click Search for Others if the correct JRE is not in the list.
  12. Click Next.
  13. On the Choose Install Folder panel, click Next to accept the default location or click Choose to navigate to a different folder, then click Next.

    The default location is the installation folder chosen in the previous step.

  14. On the Choose Shortcut Folder panel, indicate where you want icons created for the connector and click Next.
  15. Read the information on the Pre-Install\Update Summary panel and click Install.

    An informational panel indicates that the connector installation is in progress. The Register Connector Manager on the GSA panel is displayed.

  16. Type the search appliance administrator user name in the GSA UserID field.
  17. Type the password for the administrator in the GSA Password field.
  18. Type the search appliance port number in the GSA Port field.
  19. Type in the Connector Manager Name and Description.
  20. Click Next.

    The installer indicates whether the installation process succeeded or failed and displays information about connector manager connectivity status, the connector manager URL, search appliance status, and the search appliance display URL.

  21. Click Done.
  22. To start the connector service, click Yes.

    Apache Tomcat starts and deploys the connector manager and connector.

  23. If the Start Database connector Service after Installation check box was left unchecked, start the connector service:
    • On Windows, click Start > Programs > Googleconnectors > connector_name > Start Database connector Service.
    • On Linux, to start the connector as a console, open a terminal windows and navigate to the installation location. Use the following command:

      ./Start_FileNet_Connector_Console

  24. If you did not register the connector manager from the connector installer, continue with the instructions in this document for Registering a Connector Manager. If you registered the connector manager from the connector installer, continue with the instructions in this document for Configuring a Connector on the Admin Console.

Registering a Connector Manager on the Admin Console

This section describes how to register a connector manager on the Admin Console.

If you registered the connector manager from the connector installer during the installation process, skip this section.

To register a connector manager on the Admin Console:

  1. Use a browser to log in as an administrator to the Admin Console on the target Google Search Appliance.
  2. Click Connector Administration > Connector Managers.

    If any connector managers are configured, a list of existing connector managers is displayed.

  3. In the Manager Name field, type a name to identify the new connector manager on the Admin Console.
  4. In the Description field, type a description of the new connector manager.
  5. In the Service URL field, type the URL to the Tomcat instance where the connector manager is running.

    This is the root access URL for the connector manager. Ensure that the location you enter is a fully-qualified host name or an IP address. For example, use http://example.com:8080/connector-manager, not http://example:8080/connector-manager.

    If you enter the Service URL and it contains a URL ending in .local or .domain, you see the error Invalid connector manager URL. Use the IP address of the host instead.

    For example, if the connector manager is located in the $CATALINA_HOME/webapps/connector-manager/ directory of a Tomcat server running on the myappserver host machine, its location is

    http://example.com:8080/connector-manager

    The following values are used in this example:

    • http://example.com

      The host name of the computer on which Tomcat runs. This must be a fully-qualified domain name.

    • 8080

      The default http port on which Tomcat serves web applications. The value is configurable. See the Apache Tomcat documentation for further information on changing the value

    • /connector-manager

      The name or context of the web application.

    If access from the Google Search Appliance to Apache Tomcat is through a proxy server, the URL in the Service URL field must include the proxy redirect. For example:

    http://proxy.myexample.com:81/tomcat/connector-manager

  6. Click Save.

    The Admin Console displays a message saying New Connector Manager successfully added. The new connector manager appears in the list of connector managers. If the connector manager is running and Google Search Appliance can connect to it, a green dot appears in the Status column next to its name.


Configuring a Connector on the Admin Console

Use these instructions to configure a database connector on the Admin Console.

To configure a database connector:

  1. Ensure that Apache Tomcat is running and ensure that the database server is running and accessible from the connector manager host.
  2. On the Google Search Appliance Admin Console, click Connector Administration > Connectors.

    The list of existing connectors is displayed.

  3. In the Add Connector section, choose the connect manager you registered in Registering a Connector Manager.
  4. Click Add New Connector.

    Additional fields are displayed, including the name of the connector manager you selected.

  5. In the Connector Name field, type the name of the connector instance.

    Each connector instance added to a particular connector manager or Google Search Appliance must have a unique name. The connector name must consist of no more than 64 alphanumeric characters. All alphabetical characters must be lower-case. Connector names may include underscores (_) and hyphens (-), but they cannot begin with a hyphen.

  6. On the Type drop-down list, select Database.
  7. Click Get Configuration Form.

    The connector manager name, connector name, and connector type are displayed. These fields cannot be edited.

  8. In the Username field, type the user name of a database user.
  9. In the Password field, type the password for the database.
  10. In the JDBC Connection URL field, type the JDBC connection string used for connecting to the database.
  11. In the Database Name field, type in the name of the database the connector will traverse.
  12. In the Connector Hostname field, type in the fully-qualified name of the host on which the connector manager is running.
  13. In the JDBC Driver Classname field, type in the fully-qualified driver class name without the class extension. For example, for MySQL, type in com.mysql.jdbc.Driver.
  14. In the SQL Query field, type in a valid SELECT query that the connector will use to fetch records from the database during traversal.
  15. In the Primary Keys field, enter a primary key column name for the SQL SELECT query. If the database table uses a composite primary key, enter the columns forming the primary key as a list of comma-separated values.
  16. In the Stylesheet for serving results field, optionally provide valid XSLT for customizing the appearance of results. If you do not provide a custom stylesheet, the default stylesheet is used.
  17. In the Last Modified Date field, optionally provide the name of the column in the table containing the last modified date.
  18. In the Authz Query field, type in a valid SQL query for the connector to use for authorization.
  19. To optionally configure External Metadata Indexing, select one of the radio buttons and type in the required information in the corresponding text fields:
    • Select Document URL Field and type in the field name containing the document URL.
    • Select Document ID Field and type in the field name containing the Document ID and the base URL to combine with the document ID to construct a URL for the document.
    • Select BLOB/CLOB Field and type in the name of the field containing the BLOB or CLOB data and, optionally, the name of the field containing the fetch URL for the content.
  20. In the Traversal Rate section, type the number of documents per minute that you want traversed.

    The default is 200.

  21. In the Retry Delay field, type the number of the minutes the connector waits between when a traversal is completed and when the next traversal starts.
  22. To suspend the traversal process without changing the existing connector schedule, check Disable Traversal.
  23. In the Connector Schedule section, indicate the hours between which you want the repository traversed.

    Note that a connector scheduled to run from 12 a.m. to 12 a.m. always runs. Any other schedule with the same beginning and ending time never runs, either for a connector or for the Google Search Appliance's standard crawl function.

  24. Click Save Configuration.

    You are returned to the Connectors page.

  25. Click the Edit link and then click Add Line to Schedule for each additional traversal period you want to schedule.
  26. Click Save Configuration.

    If the connector is configured correctly, the new connector is named on the Connectors list and on the Tomcat host, a subdirectory called connector_instance_name is created in the WEB-INF/connectors/Database directory. In the WEB-INF/connectors/Database/connector_instance_name directory, a connector_instance_name.properties file is created.


Verifying That the Connector is Working

After you configure the connector, wait a few minutes and then verify on the Admin Console Feeds page that the Google Search Appliance is receiving feeds. Ensure that the following entry exists on the Crawl Diagnostics page:

connector_instance_name.localhost

Click the entry and navigate through successive links to verify that documents have been sent to the search appliance by the connector named connector_instance_name as content feeds.

After you verify that the search appliance is correctly receiving feeds, perform a search. Unless all content indexed by the connector is public content, perform a secure search.

To view the documents crawled by the connector and the data fed to the search appliance, enable feed logging, a feature that is disabled by default. This is available only for connectors installed on stand-alone hosts.

To enable feed logging:

  1. On the connector manager host, navigate to the directory where the connector is installed.
  2. Navigate to the Tomcat\webapps\connector-manager\WEB-INF directory or folder.
  3. Start a text editor and open the file applicationContext.properties.
  4. Locate the property feedLoggingLevel and change the value to ALL.
  5. Save the file.
  6. Restart the connector. The feed logs are available for all new documents sent by the connector to the search appliance.

Traversal

The following sections describe how the connector traversal process works:


About the Traversal Process

The Google Search Appliance locates web and file system content for indexing through a process called crawl or crawling.

The Google Search Appliance locates content in a content repository using a process called traversal. Traversal is a process in which the connector issues queries to the repository to retrieve content files and the metadata associated with each content file. The content files and metadata are then fed to the Google Search Appliance as a content feed or a metadata-and-URL feed. For more information about content feeds, see the Feeds Protocol Developer's Guide in GSA Product documentation.

In the initial traversal of a repository, the files are retrieved by last-modified date, starting with the oldest documents in the repository. After the initial traversal, files are retrieved when they are added to a repository or modified.

If the set of metadata that you select for index is changed, you must retraverse the content, using the instructions in Resetting Traversal.


How the Traversal Rate Affects Connector Behavior

When you configure a connector instance on the Google Search Appliance Admin Console, you set a traversal rate. The value indicates how many documents per minute the connector traverses in the repository. The default value is 200 documents per minute.

You can set the traversal rate to values higher or lower than 200 documents per minute. The connectors and connector manager are capable of faster traversal rates.

  • To reduce resource consumption in the repository, lower the traversal rate.
  • To increase indexing speed, raise the traversal rate.

If the traversal rate is set to 100 and the connector traverses 100 documents in less than one minute, the traversal process pauses. When the full minute elapses, the traversal process resumes.


Creating and Tuning Connector Schedules

When you schedule connector instances, the performance of the repository is a significant consideration. Depending on the number of traversals and the size of the documents retrieved for indexing, the use of connectors may degrade repository performance. Monitoring and performance-tuning the repository server is especially important when you deploy a new connector or document repository.

Note that a connector scheduled to run from 12 a.m. to 12 a.m. always runs. Any other schedule with the same beginning and ending time never runs, either for a connector or for the Google Search Appliance's standard crawl function.

When you determine the connector schedule, taking the following factors into account :

  • When to run the traversal process

    You might add a connector instance to run in off-peak hours to spread out the initial index creation during times of low demand on the repository.

  • How long to run the traversal process

    You might add a connector instance with a very brief schedule to perform predeployment testing, and experiment to see the effects of lengthening the schedule.

A connector instance cannot self-modify its traversal schedule. Therefore, you must monitor the performance of both the Google Search Appliance and the content management system regularly, and make manual adjustments to the traversal schedules of connectors to optimize performance. You can tune scheduling for optimal performance in these ways:

  • Create a schedule that minimizes the number of concurrent traversal processes that are running.
  • Restrict the times at which those processes run. For example, if the content management system is executing a resource-intensive job, the connector might run slowly. Schedule the connector to run at times when demand on the content management system is light.

Additionally, the connector manager interrupts a connector that takes too long to process a batch of documents. The default duration after which the connector manager interrupts the connector is 1800 seconds, or 30 minutes. The duration is set by the value of the traversal.time.limit property in the applicationContext.properties file. If you want a shorter duration, you can change the value of traversal.time.limit.

To change the default value of the traversal.time.limit property:

  1. Stop Apache Tomcat.
  2. Open the applicationContext.properties file in a text editor. The top of the file contains comments with explanatory text. Do not uncomment any of the explanatory text, including the example for traversal.time.limit.
  3. Examine the file to see whether there is a traversal.time.limit entry.
    • If there is an entry, modify the duration.
    • If there is no entry, add one to the end of the file:

      traversal.time.limit=duration_in_seconds

  4. Save the file.
  5. Restart Tomcat.
Changing the Connector Retry Delay and Schedule

In connector manager 3.0 and search appliance software version 6.0 and later, the search appliance Admin Console enables you to modify the connector retry delay, which is the time period that elapses between when one traversal is completed and the next starts. For example, you might want the connector to traverse the repository every hour between 8 a.m. and 8 p.m. or every two hours from midnight to 9 a.m.

The default retry delay is 5 minutes.

To change the traversal schedule, set the start and end times for traversal on the Connector Schedule drop down menus.


Resetting Traversal

If traversal has stopped or no new documents are being fed to the search appliance, you can reset the connector traversal process. When you reset the traversal, the content is traversed in full from the beginning point and the index is recreated.

In search appliance software version 6.0 and later, use Reset link for the connector instance on the Admin Console > Connectors page. On search appliances running software versions earlier than 6.0, use the following instructions from a browser.

To reset the traversal, open a browser and enter a URL in the following format, where connector_manager_host_address is the location of the connector manager and connector_name is the name of the connector whose traversal you are restarting:

http://connector_manager_host_address:8080/connector-manager/restartConnectorTraversal?ConnectorName=connector_name

For example, if the host address is http://www.myhost.com/ and the connector manager is named our_connector:

http://www.example.com:8080/connector-manager/restartConnectorTraversal?ConnectorName=our_connector

The URLs are case-sensitive. After you submit the command, you see a response in the browser window. Some browsers display only a zero (0). Other browsers display a full XML document. A 0 response indicates success. A nonzero response indicates a failure.

<CmResponse>
  <StatusId>0</StatusId>
</CmResponse>

Note that with the default Connector Manager v2.x configuration, connector_manager_host_address must be localhost (or more specifically, 127.0.0.1), and the request must originate from the machine on which the Connector Manager is running. If direct access to the Connector Manager machine is inconvenient, Connector Administrators may wish to add administration machines to the list of IP addresses allowed by the RemoteAddrValve. For more details see this page.


When to Delete Feeds

Under the following circumstances, Google recommends that you delete connector feeds. This recommendation applies only to content-feed-based connectors.

  • When you reindex content and the expected new document set leaves out documents or metadata that were previously indexed.
  • When you delete a connector instance

When you are reindexing the content, follow this general procedure:

  1. On the Admin Console > Connector Administration > Add Connector page, check Disable Traversal.

    Traversal is enabled by default.

  2. Make any required updates to the connector configuration.
  3. Delete the feed.
  4. Monitor the Crawl Diagnostics page in the Admin Console.
  5. When the indexed documents are removed from the index, navigate to the Connector Administration >Connectors page and click the Reset link for the connector.
  6. On the Admin Console >Connector Administration > Add Connector page, enable traversal by unchecking Disable Traversal.

If you are deleting a connector instance, we recommend that you separately delete the feed. Otherwise, content indexed by the connector is not removed from the index and public content indexed by the connector continues to appear in search results. Secure content does not appear in search results because the authorization check fails.


When to Restart the Connector Service

Restarting the connector service means restarting Apache Tomcat. Restart the connector service only under the following circumstances:

  • When you manually edit the connector's properties file or one of the configuration files (applicationContext.xml, applicationContext.properties, logging.properties, or connectorInstance.xml). Alternatively, for edits to the connectorInstance.xml file only, you can apply the changes on the Admin Console, without restarting the connector service. Click the Edit link for the connector instance, then click Save Configuration.
  • When you install a connector or connector manager JAR file.

Serving

The following sections describe how the connector serving process works and how serve-time security is maintained:


About Serving

Using the Google Search Appliance and Google Search Appliance Connector for Databases to search a relational database is similar to using Google.com to search the web.

To locate particular information or documents in the repository, a user opens a browser window and navigates to a search page. The search page can be the default search page available on the Google Search Appliance or it can be a customized search page. The user types a search term in the search box and clicks Search.

The Google Search Appliance searches its index for documents and metadata containing the user's search term.

When the Google Search Appliance finds all the documents that match the search request, it presents the user with a pop-up window and asks for the user's user name and password. The connector manager passes the search results and the user credentials to the repository server. The repository server authenticates the user, evaluates the permissions for each document returned by the user's search, determines which documents the user is authorized to view, and returns that information to the connector manager.

The Google Search Appliance displays a results page listing the documents the user is authorized to view. When the user clicks a link on the results page, a web client window opens in which the user can view the document or its metadata, depending on how the connector is configured. If the user does not have an open session to the repository, the web client asks for the user's login credentials before displaying the document.

The Database Connector does not support connector authentication using the search appliance Security Manager. You must use LDAP or Kerberos/SAML authentication configured on the search appliance.

You must also provide an authorization query in the connector configuration form, or there is no authorization at serve time and all content from the Database Connector is served as public content.

Secure search further requires that you have a table in the database system that contains mappings between the username and records in the database system. The username coming from the search appliance authentication method must match the name in the user mapping table in the database for the connector successfully perform authorization. For more information on authorization in the Database Connector, including examples of SQL query design, see the Database Connector.


Uninstalling Connectors and Connector Managers


Deleting a Connector Instance from the Admin Console

You delete a connector instance only on the Admin Console of the Google Search Appliance. When you delete the instance, you delete the configuration information for the instance. The connector manager no longer creates and runs the instance.

Each connector instance is listed on the Admin Console in the Connector Administration > Connectors section. The indicator light is either green or red. Green indicates the existence of the connector instance.

To delete a connector instance:

  1. Log in to the Admin Console as an administrator.
  2. Click Connector Administration > Connectors.
  3. Click the Edit link for the correct connector.
  4. Check the Disable Traversal checkbox for the connector you are deleting.
  5. Click Save Configuration.
  6. On the Connector Administration > Connectors page, locate the connector instance you want to delete.
  7. Click the Delete link on the line for the correct connector instance.
  8. Click OK.

Deleting a Connector Manager

To delete a connector manager, you must first unregister the connector manager from the Admin Console, then uninstall the connector manager on the Tomcat host.

Before you unregister a connector manager, you must delete all connector instances associate with that connector manager. If you have a large number of connector instances, you can first stop the Tomcat instance where the connector manager is running, then unregister the connector manager.

It is also possible to uninstall the connector manager on the Tomcat host, then unregister the connector manager on the Admin Console.

Unregistering a Connector Manager from the Admin Console

To unregister a connector manager from the Admin Console:

  1. Log in to the Admin Console as an administrator.
  2. Click Connector Administration > Connector Managers.
  3. Locate the connector manager you want to delete.
  4. Click the Unregister link on the line for the correct connector manager.
  5. Click OK.
Uninstalling a Connector Manager

To uninstall a connector manager from the Tomcat host, do one of the the following:

  • On Windows, click Start > All Programs > Google Search Appliance Connector version_number > Uninstall
  • On Linux, click the appropriate shortcut.

To manually delete a connector manager on the Apache Tomcat host:

  1. Log in to the Apache Tomcat host as the installation owner (the user who installed Tomcat).
  2. Shut down Tomcat.
  3. Navigate to the $CATALINA_HOME/webapps directory.
  4. Delete the connector-manager.war file.
  5. Delete the $CATALINA_HOME/webapps/connector-manager directory.
  6. Restart Tomcat.

Troubleshooting the Google Search Appliance Connector for Databases

If you have a problem that requires you to file a ticket with Google Cloud Support, be prepared to provide Support with the following information:

  • Verbose connector logs. See Logging for information on changing the default logging level. If you are reporting a problem to Support, it is ideal if you can reproduce the problem with the logging level set to ALL. However, log files with entries made when the problem occurred are also helpful.
  • Connector configuration files.
  • Feed record and metadata log file. See Logging Feed Record and Metadata Information to a Text File for information on generating this log file.

Diagnosing Connector Problems

If you create a connector instance and no search results are returned, use the following checklist to help diagnose the problem.

Problem How to Diagnose
The connector has not traversed any documents. View the Admin Console Feeds page or Crawl Diagnostics page to confirm. View the connector logs to help determine the specific reason.
The search appliance has not accepted the feed. View the Admin Console Feeds page to determine whether the search appliance is accepting feeds.
The connector has not traversed the designated test documents. View the Admin Console Crawl Diagnostics page. Examine the connector logs and look for the end of a traversal or for errors associated with specific documents. Lastly, enable the teedFeedFile and reset the traversal.
The search appliance has not indexed the documents. This can be difficult to determine, but the Crawl Diagnostics page tells you which content files have not been indexed. Usually, you must wait until the content is indexed. This failure is more common with metadata-and-URL feed connectors.

With content feed connectors, a document can appear on the Crawl Diagnostics pages almost immediately, sometimes before the feed appears on the Feeds page. However, the document does not appear in search results for another 5 to 15 minutes. If a document does not appear on Crawl diagnostics, it has not been indexed and probably has not been traversed.

Secure documents were not included in test searches. Ensure that a secure search was performed.
There were authentication failures. Depending on the search appliance version, examine the Security Manager log or the connector logs.
There were authorization failures. Examine the authorization log on the search appliance Access Control page or the connector logs. For metadata-and-URL feeds or policy ACLs, this is where you will find the information you need. For connector authorization, the connector log has more details about failures than the search appliance authorization log.

When you examine the connector logs, error messages labeled SEVERE or Exception are good starting points. For authorization issues, search the logs for the user name of the users who experienced authorization failures.


Logging

Logging is a useful technique for recording information about how your installation is operating. You can use the information logged for troubleshooting the operations of the connector, the Google Search Appliance, and Databases.

The connector manager and connectors use the java.util.logging package for logging. The installer installs a logging mechanism for the connector and starts the logging process automatically. The default logging configuration is defined in the logging.properties file.

To customize the configuration, navigate to
connectors_root_dir/connector_name/Tomcat/webapps/connector-manager/WEB-INF/classes and edit the logging.properties file there.

The following line in the file sets the default logging level for the Databases connector:

.level=INFO

The default logging level for most packages and output destinations (handlers) is INFO. To enable debugging at a finer level of granularity, you can change the default connector manager logging level to ALL or FINER. For example, you might change the logging level as follows:

.level = ALL

The possible values of the level property are OFF, SEVERE, WARNING, INFO, CONFIG, FINE, FINER, FINEST, and ALL. The default level is INFO.

Starting with GSA version 6.14 when using connector manager 2.8.x the logging level can be adjusted via the Administration Console - however this change affects only the currently running process and will be reverted back to default upon restaring the connector manager.

The output from the FileHandler appears in the connectors_root_dir/connector_name/Tomcat/logs directory. The output appears in the google-connectors.sequence.log file, where sequence is a series of numbers starting with 0 and incremented by 1 on each occurrence (0, 1, 2, 3...n). The first three log file names would be google-connectors.0.log, google-connectors.1.log, and google-connectors.2.log.

After editing the logging.properties file, restart Tomcat.

In addition, enable logging for the content management system's native API on the Apache Tomcat host and, if relevant, on the repository server host.


Error Messages

This section describes some commonly encountered error messages and their likely solutions.

Search Appliance Unable to Connect to the Connector Manager

If the Apache Tomcat instance where the connector manager is installed is not started or if the location you type in is incorrect or invalid, a message is displayed on the Connector Manager Administration page of the Admin Console saying "The appliance could not connect to the connector manager as specified in the location. Make sure that the URL is correct, or try again later."

Admin Console Error

HTTP 404 Error When Registering a Connector Manager

When you are registering a new connector manager, you might see the following error message:

The HTTP response failed with the following code: 404. No external connector managers registered.

This means that the CATALINA_HOME environment variable is not set correctly on the Tomcat host. Examine the Tomcat startup script or .bashrc and ensure that CATALINA_HOME points to the correct Tomcat installation.

HTTP 401 Error When Configuring a Connector

When creating the connector, GSA admin may get the following error:

Cannot connect to the given SharePoint Site URL with the supplied Domain/Username/Password. Reason:(401) Unauthorized

  1. Check that the username and password are correct. Configure the crawler access under Crawl and Index > Crawler Access and perform a manual fetch under Status and Reports > Real-time Diagnostics in the Admin Console to verify connectivity and validate the credentials. If you get a 401, then please confirm the username and password again. If you get a http status of 200, check logs for information below.
  2. Check the connector log. If you see the following error, please check that the user has contribute access.

    Aug 23, 2011 11:18:56 AM com.google.enterprise.connector.sharepoint.wsclient.WebsWS checkConnectivity
    WARNING: Unable to connect.
    AxisFault
    faultCode: {http://xml.apache.org/axis/}HTTP
    faultSubcode:
    faultString: (401)Unauthorized
    faultActor:
    faultNode:
    faultDetail:
    {}:return code: 401
    401 UNAUTHORIZED
    {http://xml.apache.org/axis/}HttpErrorCode:401

Feed Exception During Traversal

You might see the following error message if you installed a connector manually or you are using a connector manager earlier than version 2.0:

SEVERE: Feed Exception during traversal.
com.google.enterprise.connector.pusher.FeedException: Connection refused: connect

This happens when the connector service is reinstalled, whether or not it is the same version, to a new location, but it is not reregistered on the Admin Console. The connector service points at localhost by default, rather than pointing to the search appliance. In this situation, the connectors are unable to feed documents to the search appliance.

To fix this issue:

  1. Log in to the Admin Console and navigate to the Connector Managers page.
  2. Click the Edit link for your connector manager.
  3. Click the Save button.

Alternatively, you can manually edit the applicationContext.properties file in the Tomcat/webapps/connector-manager/WEB-INF directory by changing localhost to the IP address of the GSA in the following line:

gsa.feed.host=localhost

If you manually edit the file, you must restart Tomcat after you save your changes.

Error Message When Trying to Add a Connector to an Unavailable Connector Manager

When a connector manager is unavailable, the Admin Console displays a circular red indicator next to the connector manager name. If you try to add a connector to an unavailable connector manager, you see the following error message:

The appliance encountered an error while trying to make the following servlet call: getConnectorList

The connector manager might be unavailable for one of the following reasons:

  • Tomcat is not running on the registered host and port.
  • The connector manager host is unreachable.
  • The Tomcat Remote Address Filter is rejecting access.

Check each condition and correct any problems.


Logging Feed Record and Metadata Information to a Text File

You can log all URLs and metadata fed to a Google Search Appliance without recording all content. There are two ways to implement this logging technique.


Using the feedLoggingLevel Property

To use the feedLoggingLevel property to log URLs and metadata:

  1. Log on to the Apache Tomcat host with the user account under which Tomcat runs.
  2. Shut down the Tomcat instance that hosts the connector manager.
  3. Navigate to the webapps/connector-manager/WEB-INF/ directory.
  4. Open the applicationContext.properties file in a text editor.
  5. Set the feedLoggingLevel property to the value ALL:

    feedLoggingLevel=ALL

  6. Save the applicationContext.properties file.
  7. Restart Tomcat.

    The logging information is recorded in the $CATALINA_BASE/logs/google-connectors.feed%g.log files, where %g is a generation number used to distinguish among rotated logs.


Using a logging.properties Configuration File

To use a logging.properties configuration file to log URLs and metadata:

  1. Log on to the Apache Tomcat host with the user account under which Tomcat runs.
  2. Shut down the Tomcat instance that hosts the connector manager.
  3. Navigate to the logging.properties file.
    • If you installed the connector using the installer, the file is in the connector_directory/Tomcat/webapps/connector-manager/WEB-INF/classes/ directory.
    • If you installed the connector manually, navigate to the location where you created a logging.properties file. The logging.properties file is probably in the $CATALINA_HOME/webapps/connector-manager/WEB-INF/classes directory. If not, copy the logging.properties file from the $JAVA_HOME/lib/ directory to the $CATALINA_HOME/webapps/connector-manager/WEB-INF/classes directory. You might have to create the /classes directory manually.
  4. Open the logging.properties file in a text editor.
  5. Add the following line to the file:

    com.google.enterprise.connector.pusher.DocPusher.FEED_WRAPPER.FEED.level=FINER

  6. Save the logging.properties file.
  7. Restart Tomcat.

    The logging information is recorded in connector_directory/Tomcat/logs/google-connectors.feed%g.log, where %g is a generation number used to distinguish among rotated logs.


Related Documentation

For more information on the connector manager, see Introducing Connectors. For release notes, see the connector open-source project site.

Was this helpful?
How can we improve it?