Configuring the Connector for Documentum (Deprecated)

Connector software version 3.0
Connector Manager version 3.0
Installer version 3.0


This document contains the information you need to install the Google Search Appliance Connector for Documentum and configure the Google Search Appliance and the connector to traverse, index, and search content in an EMC Documentum content repository.

This document is for Documentum administrators and administrators who install and configure the Google Search Appliance. If you are not familiar with the system that the connector will traverse and index, work closely with your system administrators to determine the correct values for installing and configuring the connector.

Use this document in conjunction with the following related documents:

The rest of this book describes how to configure the Google Search Appliance Connector for Documentum.



Introducing the Google Search Appliance Connector for Documentum

The Google Search Appliance Connector for Documentum is software that enables the Google Search Appliance to index and search content files and metadata that are stored in an EMC Documentum content repository. The connector formats content and metadata from the repository and feeds it to the Google Search Appliance as a content feed. This section discusses how the Google Search Appliance Connector for Documentum works and the different software components in an installation.

For a general overview of how the connector manager and connectors work, see Introducing Connectors.

The connector provides query-time authentication and authorization using the available native security mechanisms.

 

After you install and configure a Documentum repository, the connector traverses the Documentum repository and feeds Documentum documents and their metadata to the search appliance for indexing. Traversal begins with the earliest document modification date and works forward. After the initial traversal, the connector works in an incremental mode to index documents that are added or modified.


Components in the Google Search Appliance Connector Installation

A typical connector installation consists of these components:

  • A content repository, which consists of the content management system server, the content files, and the supporting database in which metadata is stored, if any.

    A Google Search Appliance can index multiple repositories. You must configure one connector for each repository you index.

  • The content management system web client, installed on any platform supported by the content management system
  • The content management system's native API, which is typically installed on the connector manager host
  • Any other supporting software components of the content management system
  • An LDAP server or other external mechanism used for user authentication
  • Java Development Kit (JDK) or Java Runtime Environment version 1.5
  • Google Search Appliance Connector installation, which consists of Apache Tomcat, the connector manager, and the connector for your content management system. These components are installed using a Google-provided installer
  • A Google Search Appliance

Supported Documentum Versions

The connector manager and Google Search Appliance Connector for Documentum are supported on the releases described in the following table.

Connector Versions Documentum Content Server Version Documentum Foundation Classes (DFC) Version Required Java Version
3.0 5.3 and 5.3 Service Pack releases Version 5.3 SP4 or later; must be compatible with the Content Server version 1.5
3.0 6.0 and 6.0 Service Pack releases Version 6.0 or later; must be compatible with the Content Server version 1.5
3.0 6.5 and 6.5 Service Pack releases Version 6.5 or later; must be compatible with the Content Server version 1.5 or 1.6
3.0 6.6 and 6.6 Service Pack releases Version 6.6 or later; must be compatible with the Content Server version 1.5 or 1.6

Supported Operating Systems

The connector manager and Google Search Appliance Connector for Documentum are supported on these operating system platforms:

  • Windows Server 2003, R2 (32- and 64-bit versions)
  • Windows Server 2008 and Windows Server 2008 R2 (32- and 64-bit versions)
  • Red Hat Enterprise Linux 3.0 Update 8 (32-bit only)
  • Red Hat Enterprise Linux 4.6
  • Red Hat Enterprise Linux 5.1
  • SUSE Linux Enterprise Server 9 (32-bit only)
  • SUSE Linux Enterprise Server 10
  • SUSE Linux Enterprise Server 11

The connector manager and the Google Search Appliance Connector for Documentum are supported in virtualization environments. Google does not provide support for specific virtualization environments or for issues that are specific to virtualization.


Supported Java Version

The Google Search Appliance Connector for Documentum requires a minimum of Java Runtime Environment 5 and may support JRE 6.


Apache Tomcat Version

The installer installs a connector manager, a connector type, and Apache Tomcat 6.0.18. Tomcat 5.5.23 is supported for this release of the connector and connector manager.


Before You Install the Connector

You install the Google Search Appliance Connector for Documentum using an installer that automatically installs and configures Apache Tomcat, the connector manager, and the connector.

Before you install, ensure that the following software is installed and functioning properly:

  • The Document repositories that you want to index
  • The connection brokers to which the Content Servers project
  • Documentum Webtop
  • Documentum Foundation Classes, as required by the Content Server, installed on the host where Apache Tomcat and the connector manager will be installed
  • JRE on the Apache Tomcat host
  • Depending on the Documentum version, dmcl.ini or dfc.properties on the Apache Tomcat host
  • If you are installing with Documentum 5.x, ensure that the value of the connect_pooling_enabled parameter in the dmcl.ini file is set to F before you install the Google Search Appliance Connector for Documentum. Documentum 6 and 6.5 installations do not include the dmcl.ini file.

For complete information on supported Content Server, DFC, and Java versions, see Supported Documentum Product Versions.


Information You Need for Installing the Google Search Appliance Connector for Documentum

Before you install the Google Search Appliance Connector for Documentum, you need the information described in the following table. Work with your Documentum system administrator to determine the correct values. The Documentum system administrator can also assist you with installing Documentum Foundation Classes (DFC).

Value Description Your Values
Documentum Superuser user name and password The user name and password used by the Google Search Appliance to connect to the repository.  
Name of the host on which the Documentum connection broker is installed Required for installing the Documentum Foundation Classes (DFC) on the host where you want to run the connector manager, if that host does not already have DFC installed. See the DFC documentation provided by EMC.  
Port used for communicating with the connection broker Required for installing the Documentum Foundation Classes (DFC) on the host where you want to run the connector manager, if that host does not already have DFC installed. See the DFC documentation provided by EMC.  
IP address of Apache Tomcat The IP address of the host where Apache Tomcat is installed. The URL must be in the format http//Tomcat_IP_Address:Tomcat_port/connector-manager/  
Repository names The names of the Documentum repositories that the Google Search Appliance will index  
Additional WHERE clause A DQL WHERE clause restricting which documents are indexed. You can designate an additional WHERE clause on the connector configuration page. For more information on WHERE clauses, see the DQL documentation.  
Object types to include in or exclude from indexing You can designate which object types to include or exclude on the connector configuration page.  
Properties to include in or exclude from indexing You can designate which properties to include on the connector configuration page. Use the connectorInstance.xml file on the Apache Tomcat host to exclude specific properties from indexing.  
Webtop URL The URL to the Webtop instance that end users will access to view documents that appear in Google Search Appliance search results. The URL can point to either the document itself or to the properties of the document. See Deciding on the Webtop URL Format for more information.  
Traversal rate The rate at which the Google Search Appliance traverses the repository  
Connector schedule The times at which the Google Search Appliance traverses the repository. Note that a connector scheduled to run from 12 a.m. to 12 a.m. always runs. Any other schedule with the same beginning and ending time never runs, either for a connector or for the Google Search Appliance's standard crawl function.  

Deciding on the Webtop URL Format

You can use the format of the Webtop URL to control what an end user sees after clicking a result in the browser window:

  • To open a document with a choice of whether to view or edit the document, use the following format, which ends in a trailing slash:

    http://webtop_server_name:webtop_port/webtop_application_name/drl/objectId/

    The default port is 8080 and the default webtop_application_name is webtop. For example, http://mywebtopserver:8080/webtop/drl/objectId/

  • To view a document, use the following format, which ends in a trailing equal-to sign (=):

    http://webtop_server_name:webtop_port/webtop_application_name/action/drlview?objectId=

    The drlview action bypasses the question about whether you want to view or edit the document.

  • To download a document, use the following format, which ends in a trailing equal-to sign (=):

    http://webtop_server_name:webtop_port/webtop_application_name/component/getcontent?objectId=

    The getcontent component bypasses the UCF applet, and returns the content directly as a download. Users see the open or save dialog.

  • To access the properties of a document, use the following format, which ends in a trailing equal-to sign (=):

    http://webtop_server_name:webtop_port/webtop_application_name/component/properties?component=attributes&objectId=

    The default port is 8080 and the default webtop_application_name is webtop. For example, http://mywebtopserver:8080/webtop/component/properties?component=attributes&objectId=

The connector appends the object ID of each document to the Webtop URL to generate the display URL for each document.

The Webtop URL must include a fully-qualified host name for the Cached and Text Version links in the search results to work.

As an alternative, you can build a WDK customization or a small DFC application that serves content without context. The only requirement is that the connector must append the object ID to the Webtop URL that you provide, which need not have anything to do with Webtop.You must also authenticate the user and check their permissions before returning documents, unless you want to provide completely open access to the documents.


Connector Quick Start

This section provides instructions that enable you to quickly deploy and run a Google Connector for Documentum instance. After you deploy the connector, it crawls and indexes the repository content and the documents become searchable using the Google Search Appliance.

To deploy a connector and create a connector instance with the default settings:

  1. Prepare the Documentum server for the connector.
  2. On the Google Search Appliance Admin Console, Configure Crawl Patterns and Feeds for the connector.
  3. Install the connector.
  4. Register the Connector Manager on the search appliance.
  5. Create a Connector Instance.
  6. Verify that the connector is discovering content from the repository and verify that the search appliance is indexing the content.

Preparing the Documentum Server for the Connector and Ensuring that Documents Are Deleted from the Search Index

You can use the audittrail feature of EMC Documentum to ensure that documents are deleted from the Google Search Appliance index when they are deleted from the repository.

To enable document deletion:

  1. Log in to the repository as a Superuser.
  2. Register the dm_destroy and dm_prune events for auditing.

    You can use Documentum Administrator or DQL to register events. In Documentum 6.x, dm_destroy and dm_prune are both part of the dm_default_set, which is audited by default.

  3. Ensure that the Superuser account that is used for traversal and indexing has at least View Audit extended user privileges.

    You can use Documentum Administrator or DQL to modify user privileges.

For more information on using Documentum Administrator or DQL, refer to the documentation from EMC.

Improving Connector Performance with Database Indexes

You can substantially improve the performance of the Documentum connector by adding the right databases indexes to the Documentum repository. For complete information, see the database index topic on the open-source project wiki.

Configuring Crawl and Feeds for the Connector

Before you install the Google Search Appliance Connector for Documentum, you must make an addition to the Follow and Crawl URLs defined in the Admin Console. The Google Search Appliance rejects content in the repository without the addition.

To configure crawl and feeds for the connector:

  1. On the Admin Console, navigate to the Crawl and Index > Crawl URLs page.
  2. In the Follow and Only Crawl URLs with the Following Patterns box, add the following statement:

    ^googleconnector://

    For metadata-and-URL feeds, the following format is also supported:

    http://hostname:port/foo/bar.html

  3. Save the configuration.
  4. Click Crawl and index > Feeds.
  5. In the List of Trusted IP Addresses section, select Trust feeds from all IP addresses or Only trust feeds from these IP addresses.
  6. If you selected Only trust feeds from these IP addresses in step 5, type in the trusted IP addresses.
  7. Click Save Settings.

Installing the Google Search Appliance Connector for Documentum

This section describes the installation process for the Google Search Appliance Connector for Documentum. You install the connector using an installer that installs Apache Tomcat, a connector manager, and the connector on a host computer.

The instructions that follow are in two parts. In the first part, you download and uncompress the installer package. In the second, you install the software on the connector host.

To download and uncompress the installation package:

  1. Log in to the host using an account with sufficient privileges to install the software.
  2. Start a web browser.
  3. Navigate to the connector download site.
  4. Download the correct software distribution package to the host where you are installing the software.
  5. Uncompress the package.
  6. If you are on Windows, skip step 7 and go to the instructions immediately below for installing Tomcat, a connector manager, and the connector.
  7. If you are on Linux, follow these instructions.
    1. Open a terminal window and go to the base directory of the GCI.bin file in the extracted folder.
    2. To run the installer in graphical mode, execute the following command:

      ./GCI.bin LAX_VM/java_location_to_java

      for example, ./GCI.bin LAX_VM /usr/java/j2sdk1.5.2_x/bin/java

    3. To run the installer in console mode, execute the command in Step 3 above with the -i console argument appended.
    4. Go to the following instructions and proceed from Step 2.

To install Apache Tomcat, a connector manager, and the Google Search Appliance Connector for Documentum:

  1. Double-click the distribution file to start the installer.

    You see an introductory panel.

  2. Click Next.

    The Licence Agreement panel appears.

  3. Indicate whether you accept or decline the terms of the license and click Next:
    • To accept the license, click I accept the terms of the License Agreement.
    • To decline the terms, click I do NOT accept the terms of the License Agreement.
  4. On the Select Connector panel, select the correct connector and click Next.
  5. On the Install Connector panel, choose Install new Google Connector and click Next.
  6. On the Documentum Connector Dependencies panel, navigate to the location of each of the required files.
    • Under Documentum 6, only dfc.jar and the config folder are required.
    • Under Documentum 5.x, the config directory and the following files are required:
      • dfcbase.jar
      • dfc.jar
      • dmcl.ini
  7. If you choose the wrong location and want to use the default location, click Restore Default for the particular location.
  8. Click Next.
  9. On the Connector Configuration panel, enter the name you want to assign the connector and a port number that is not already used by another application.

    If you are creating multiple installations of the connector, ensure that you do not use consecutive port numbers. Each connector installation requires two consecutive port numbers for use by Tomcat. For example, if ConnectorInstall1 is installed on port 8080, do not use port 8081 for ConnectorInstall2. In addition, do not use the AJP Connector port (port 8009) listed in the Tomcat server.xml file. In installations where SSL is supported, do not use the SSL port.

  10. Enter the Google Search Appliance IP Address, which is the IP address to which the connector sends feeds.

    Entering the address ensures that only the search appliance can communicate with the connector manager.

  11. If you do not want the connector service to start automatically, uncheck the Start Documentum connector Service after Installation check box.
  12. If you do not want to register the connector manager on the search appliance during this installation process, uncheck the Register Connector Manager with GSA checkbox.
  13. Click Next.
  14. On the Choose Java Runtime Environment panel, choose the correct JRE for the connector to use and or click Search for Others if the correct JRE is not in the list.
  15. Click Next.
  16. On the Choose Install Folder panel, click Next to accept the default location or click Choose to navigate to a different folder, then click Next.

    The default location is the installation folder chosen in the previous step.

  17. On the Choose Shortcut Folder panel, indicate where you want icons created for the connector and click Next.
  18. Read the information on the Pre-Install\Update Summary panel and click Install.

    An informational panel indicates that the connector installation is in progress. The Register Connector Manager on the GSA panel is displayed.

  19. Type the search appliance administrator user name in the GSA UserID field.
  20. Type the password for the administrator in the GSA Password field.
  21. Type the search appliance port number in the GSA Port field.
  22. Type in the Connector Manager Name and Description.
  23. Click Next.

    The installer indicates whether the installation process succeeded or failed and displays information about connector manager connectivity status, the connector manager URL, search appliance status, and the search appliance display URL.

  24. Click Done.
  25. To start the connector service, click Yes.

    Apache Tomcat starts and deploys the connector manager and connector.

  26. If the Start Documentum connector Service after Installation check box was left unchecked, start the connector service:
    • On Windows, click Start > Programs > Googleconnectors > connector_name > Start Documentum connector Service.
    • On Linux, to start the connector as a console, open a terminal windows and navigate to the installation location. Use the following command:

      ./Start_dctm_Connector_Console

  27. If you did not register the connector manager from the connector installer, continue with the instructions in this document for Registering a Connector Manager. If you registered the connector manager from the connector installer, continue with the instructions in this document for Configuring a Connector on the Admin Console.

Registering a Connector Manager on the Admin Console

This section describes how to register a connector manager on the Admin Console.

If you registered the connector manager from the connector installer during the installation process, skip this section.

To register a connector manager on the Admin Console:

  1. Use a browser to log in as an administrator to the Admin Console on the target Google Search Appliance.
  2. Click Connector Administration > Connector Managers.

    If any connector managers are configured, a list of existing connector managers is displayed.

  3. In the Manager Name field, type a name to identify the new connector manager on the Admin Console.
  4. In the Description field, type a description of the new connector manager.
  5. In the Service URL field, type the URL to the Tomcat instance where the connector manager is running.

    This is the root access URL for the connector manager. Ensure that the location you enter is a fully-qualified host name or an IP address. For example, use http://example.com:8080/connector-manager, not http://example:8080/connector-manager.

    If you enter the Service URL and it contains a URL ending in .local or .domain, you see the error Invalid connector manager URL. Use the IP address of the host instead.

    For example, if the connector manager is located in the $CATALINA_HOME/webapps/connector-manager/ directory of a Tomcat server running on the myappserver host machine, its location is

    http://example.com:8080/connector-manager

    The following values are used in this example:

    • http://example.com

      The host name of the computer on which Tomcat runs. This must be a fully-qualified domain name.

    • 8080

      The default http port on which Tomcat serves web applications. The value is configurable. See the Apache Tomcat documentation for further information on changing the value

    • /connector-manager

      The name or context of the web application.

    If access from the Google Search Appliance to Apache Tomcat is through a proxy server, the URL in the Service URL field must include the proxy redirect. For example:

    http://proxy.myexample.com:81/tomcat/connector-manager

  6. Click Save.

    The Admin Console displays a message saying New Connector Manager successfully added. The new connector manager appears in the list of connector managers. If the connector manager is running and Google Search Appliance can connect to it, a green dot appears in the Status column next to its name.


Configuring a Connector on the Admin Console

Use the Add Connector page in the Google Search Appliance Admin Console to create and configure a Documentum connector instance. The Add Connector page prompts you to enter values for all required configuration parameters.

You can configure additional parameters that are not displayed on the Admin Console by modifying the connectorInstance.xml file on the Apache Tomcat host. For complete information on using those parameters, see the Advanced Configuration page on the Connector for Documentum page.

To add a Documentum connector:

  1. Ensure that Apache Tomcat is running.
  2. On the Google Search Appliance Admin Console, click Connector Administration > Connectors.

    The list of existing connectors is displayed.

  3. In the Add Connector section, choose the connect manager you registered in Registering a Connector Manager.
  4. Click Add New Connector.

    Additional fields are displayed, including the name of the connector manager you selected.

  5. In the Connector Name field, type the name of the connector instance.

    Each connector instance added to a particular connector manager or Google Search Appliance must have a unique name. The connector name must consist of no more than 64 alphanumeric characters. All alphabetical characters must be lower-case. Connector names may include underscores (_) and hyphens (-), but they cannot begin with a hyphen.

  6. On the Type drop-down list, select EMC_Documentum_Content_Server.
  7. Click Get Configuration Form.

    The connector manager name, connector name, and connector type are displayed. These fields cannot be edited.

  8. In the Username field, type the user name of a Documentum Superuser.
  9. In the Password field, type the password for the Superuser.
  10. On the drop-down list, select the Documentum repository to traverse and index.

    The list includes all repositories that project to the connection brokers listed in the dmcl.ini file on the Tomcat host (under Documentum 5.X) or listed in the dfc.properties file.

  11. In the Webtop URL field, type the URL for a Webtop instance serving the repository.

    See Deciding on the Webtop URL Format for more information.

  12. In the Advanced Configuration section, select a Root object type.

    The root object and any included object types that are subtypes of the root object are indexed.

  13. Optionally, type in an Additional WHERE clause.

    The additional WHERE clause is appended to the default DQL statement for traversing the repository.

  14. Choose object types you want indexed from the Available object types list, then click the right arrow to add them to the Included object types list.
  15. To remove object types, choose them from the Included object types list and click the left arrow.
  16. Choose properties you want indexed from the Available properties list, then click the right arrow to add them to the Included properties list.
  17. To remove properties, choose them from the Included properties list and click the left arrow.
  18. In the Traversal Rate section, type the number of documents per minute that you want traversed.

    The default is 200.

  19. In the Retry Delay field, type the number of the minutes the connector waits between when a traversal is completed and when the next traversal starts.
  20. To suspend the traversal process without changing the existing connector schedule, check Disable Traversal.
  21. In the Connector Schedule section, indicate the hours between which you want the repository traversed.

    Note that a connector scheduled to run from 12 a.m. to 12 a.m. always runs. Any other schedule with the same beginning and ending time never runs, either for a connector or for the Google Search Appliance's standard crawl function.

  22. Click Save Configuration.

    You are returned to the Connectors page.

  23. Click the Edit link and then click Add Line to Schedule for each additional traversal period you want to schedule.
  24. Click Save Configuration.

    If the connector is configured correctly, the new connector is named on the Connectors list and on the Tomcat host, a subdirectory called connector_instance_name is created in the WEB-INF/connectors/EMC_Documentum_Content_Server directory. In the WEB-INF/connectors/EMC_Documentum_Content_Server/connector_instance_name directory, a connector_instance_name.properties file is created.


Verifying That the Connector is Working

After you configure the connector, wait a few minutes and then verify on the Admin Console Feeds page that the Google Search Appliance is receiving feeds. Ensure that the following entry exists on the Crawl Diagnostics page:

connector_instance_name.localhost

Click the entry and navigate through successive links to verify that documents have been sent to the search appliance by the connector named connector_instance_name as content feeds.

After you verify that the search appliance is correctly receiving feeds, perform a search. Unless all content indexed by the connector is public content, perform a secure search.

To view the documents crawled by the connector and the data fed to the search appliance, enable feed logging, a feature that is disabled by default. This is available only for connectors installed on stand-alone hosts.

To enable feed logging:

  1. On the connector manager host, navigate to the directory where the connector is installed.
  2. Navigate to the Tomcat\webapps\connector-manager\WEB-INF directory or folder.
  3. Start a text editor and open the file applicationContext.properties.
  4. Locate the property feedLoggingLevel and change the value to ALL.
  5. Save the file.
  6. Restart the connector. The feed logs are available for all new documents sent by the connector to the search appliance.

How Search is Supported

This section describes the search features that are supported when a Google Search Appliance and Google Search Appliance Connector for Documentum are used for indexing and searching a Documentum repository.


Supported Search Functionality

By default, all searches are performed against both content files and metadata. To restrict a search to metadata, use the inmeta operator in queries.

For documents, only metadata and the original file of the CURRENT version are indexed. Renditions and previous versions are not indexed. If the original file is larger than 30 MB, the file is skipped and only metadata are indexed

The Google Search Appliance and Google Search Appliance Connector for Documentum do not support Document Query Language (DQL) or Full-Text DQL (FTDQL) queries. See documentation for the Google Search Appliance for information about how to customize querying.


Searchable Formats

The Google Search Appliance Connector for Documentum can traverse, index, and search content files in all formats supported by the Google Search Appliance. In some formats, only metadata are indexed, such as zip files and some graphics formats.


Searchable Object Types

Content files of the object type dm_document are indexed by default. Other subtypes of dm_sysobject, subtypes of dm_document, and custom subtypes can be indexed and searched, but you must manually add the types on the Advanced Configuration page when you configure a connector. Any number of object types can be indexed and searched.

Some properties are indexed by default and other properties are not indexed by default. You can control whether specific properties are indexed using the included metadata and excluded metadata lists on the Advanced Configuration page.


Metadata That is Indexed or Not Indexed

This section contains lists of the metadata that is indexed or not index by default. In the 2.0 release, a property is indexed is it is in the included metadata and it is not in the excluded metadata. The included metadata are configured on the Advanced Configuration page on the Admin Console. The excluded metadata are configured by manually editing the connectorInstance.xml file. For instructions, see the advanced configuration page on documentation page for the Documentum connector.

Metadata That is Indexed

The following properties are included by default in indexing:

object_name

r_object_type

title

subject

keywords

authors

r_creation_date

r_modify_date


Restricting Which Documents are Indexed

You can customize which documents are indexed on the Admin Console by adding a DQL WHERE clause. For example you can exclude ZIP files by adding the following clause:

a_content_type not in (select name from dm_format where dos_extension = 'zip')

To exclude ZIP files and files with a DOC extension:

a_content_type not in (select name from dm_format where dos_extension in ('zip','doc'))

The WHERE clause can only contain references to the properties of the root object type, not to properties of other included object types. You must set the root object type to be a type that contains the properties you want to use in the DQL WHERE clause. If you plan to index unrelated object types, you must add a separate connector instance.

For more information on DQL, see Documentum's documentation.


Traversal

The following sections describe how the connector traversal process works:


About the Traversal Process

The Google Search Appliance locates web and file system content for indexing through a process called crawl or crawling.

The Google Search Appliance locates content in a content repository using a process called traversal. Traversal is a process in which the connector issues queries to the repository to retrieve content files and the metadata associated with each content file. The content files and metadata are then fed to the Google Search Appliance as a content feed or a metadata-and-URL feed. For more information about content feeds, see the Feeds Protocol Developer's Guide in Product documentation.

In the initial traversal of a repository, the files are retrieved by last-modified date, starting with the oldest documents in the repository. After the initial traversal, files are retrieved when they are added to a repository or modified.

You can configure the Documentum installation so that documents deleted from the Documentum repository are also deleted from the index on the Google Search Appliance. For instructions, see Deleting Documents from the Google Search Appliance Index.

If the set of metadata that you select for index is changed, you must retraverse the content, using the instructions in Resetting Traversal.


How the Traversal Rate Affects Connector Behavior

When you configure a connector instance on the Google Search Appliance Admin Console, you set a traversal rate. The value indicates how many documents per minute the connector traverses in the repository. The default value is 200 documents per minute.

You can set the traversal rate to values higher or lower than 200 documents per minute. The connectors and connector manager are capable of faster traversal rates.

  • To reduce resource consumption in the repository, lower the traversal rate.
  • To increase indexing speed, raise the traversal rate.

If the traversal rate is set to 100 and the connector traverses 100 documents in less than one minute, the traversal process pauses. When the full minute elapses, the traversal process resumes.


Creating and Tuning Connector Schedules

When you schedule connector instances, the performance of the repository is a significant consideration. Depending on the number of traversals and the size of the documents retrieved for indexing, the use of connectors may degrade repository performance. Monitoring and performance-tuning the repository server is especially important when you deploy a new connector or document repository.

Note that a connector scheduled to run from 12 a.m. to 12 a.m. always runs. Any other schedule with the same beginning and ending time never runs, either for a connector or for the Google Search Appliance's standard crawl function.

When you determine the connector schedule, taking the following factors into account :

  • When to run the traversal process

    You might add a connector instance to run in off-peak hours to spread out the initial index creation during times of low demand on the repository.

  • How long to run the traversal process

    You might add a connector instance with a very brief schedule to perform predeployment testing, and experiment to see the effects of lengthening the schedule.

A connector instance cannot self-modify its traversal schedule. Therefore, you must monitor the performance of both the Google Search Appliance and the content management system regularly, and make manual adjustments to the traversal schedules of connectors to optimize performance. You can tune scheduling for optimal performance in these ways:

  • Create a schedule that minimizes the number of concurrent traversal processes that are running.
  • Restrict the times at which those processes run. For example, if the content management system is executing a resource-intensive job, the connector might run slowly. Schedule the connector to run at times when demand on the content management system is light.

Additionally, the connector manager interrupts a connector that takes too long to process a batch of documents. The default duration after which the connector manager interrupts the connector is 1800 seconds, or 30 minutes. The duration is set by the value of the traversal.time.limit property in the applicationContext.properties file. If you want a shorter duration, you can change the value of traversal.time.limit.

To change the default value of the traversal.time.limit property:

  1. Stop Apache Tomcat.
  2. Open the applicationContext.properties file in a text editor. The top of the file contains comments with explanatory text. Do not uncomment any of the explanatory text, including the example for traversal.time.limit.
  3. Examine the file to see whether there is a traversal.time.limit entry.
    • If there is an entry, modify the duration.
    • If there is no entry, add one to the end of the file:

      traversal.time.limit=duration_in_seconds

  4. Save the file.
  5. Restart Tomcat.

Changing the Connector Retry Delay and Schedule

In connector manager 2.0 and search appliance software version 6.2 and later, the search appliance Admin Console enables you to modify the connector retry delay, which is the time period that elapses between when one traversal is completed and the next starts. For example, you might want the connector to traverse the repository every hour between 8 a.m. and 8 p.m. or every two hours from midnight to 9 a.m.

The default retry delay is 5 minutes.

To change the traversal schedule, set the start and end times for traversal on the Connector Schedule drop down menus.


Resetting Traversal

If traversal has stopped or no new documents are being fed to the search appliance, you can reset the connector traversal process. When you reset the traversal, the content is traversed in full from the beginning point and the index is recreated.

In search appliance software version 6.0 and later, use Reset link for the connector instance on the Admin Console > Connectors page. On search appliances running software versions earlier than 6.0, use the following instructions from a browser.

To reset the traversal, open a browser and enter a URL in the following format, where connector_manager_host_address is the location of the connector manager and connector_name is the name of the connector whose traversal you are restarting:

http://connector_manager_host_address:8080/connector-manager/restartConnectorTraversal?ConnectorName=connector_name

For example, if the host address is http://www.myhost.com/ and the connector manager is named our_connector:

http://www.example.com:8080/connector-manager/restartConnectorTraversal?ConnectorName=our_connector

The URLs are case-sensitive. After you submit the command, you see a response in the browser window. Some browsers display only a zero (0). Other browsers display a full XML document. A 0 response indicates success. A nonzero response indicates a failure.

<CmResponse>
  <StatusId>0</StatusId>
</CmResponse>

Note that with the default Connector Manager v2.x configuration, connector_manager_host_address must be localhost (or more specifically, 127.0.0.1), and the request must originate from the machine on which the Connector Manager is running. If direct access to the Connector Manager machine is inconvenient, Connector Administrators may wish to add administration machines to the list of IP addresses allowed by the RemoteAddrValve. For more details see this page.


When to Delete Feeds

Under the following circumstances, Google recommends that you delete connector feeds. This recommendation applies only to content-feed-based connectors.

  • When you reindex content and the expected new document set leaves out documents or metadata that were previously indexed.
  • When you delete a connector instance

When you are reindexing the content, follow this general procedure:

  1. On the Admin Console > Connector Administration > Add Connector page, check Disable Traversal.

    Traversal is enabled by default.

  2. Make any required updates to the connector configuration.
  3. Delete the feed.
  4. Monitor the Crawl Diagnostics page in the Admin Console.
  5. When the indexed documents are removed from the index, navigate to the Connector Administration >Connectors page and click the Reset link for the connector.
  6. On the Admin Console >Connector Administration > Add Connector page, enable traversal by unchecking Disable Traversal.

If you are deleting a connector instance, we recommend that you separately delete the feed. Otherwise, content indexed by the connector is not removed from the index and public content indexed by the connector continues to appear in search results. Secure content does not appear in search results because the authorization check fails.


When to Restart the Connector Service

Restarting the connector service means restarting Apache Tomcat. Restart the connector service only under the following circumstances:

  • When you manually edit the connector's properties file or one of the configuration files (applicationContext.xml, applicationContext.properties, logging.properties, or connectorInstance.xml). Alternatively, for edits to the connectorInstance.xml file only, you can apply the changes on the Admin Console, without restarting the connector service. Click the Edit link for the connector instance, then click Save Configuration.
  • When you install a connector or connector manager JAR file.

Serving

The following sections describe how the connector serving process works and how serve-time security is maintained.


About Serving

Using the Google Search Appliance and Google Search Appliance Connector for Documentum to search an EMC Documentum content repository is similar to using Google.com to search the web.

To locate particular information or documents in the repository, a user opens a browser window and navigates to a search page. The search page can be the default search page available on the Google Search Appliance or it can be a customized search page. The user types a search term in the search box and clicks Search.

The Google Search Appliance searches its index for documents and metadata containing the user's search term.

When the Google Search Appliance finds all the documents that match the search request, it presents the user with a pop-up window and asks for the user's user name and password. The connector manager passes the search results and the user credentials to the repository server. The repository server authenticates the user, evaluates the permissions for each document returned by the user's search, determines which documents the user is authorized to view, and returns that information to the connector manager.

The Google Search Appliance displays a results page listing the documents the user is authorized to view. When the user clicks a link on the results page, a web client window opens in which the user can view the document or its metadata, depending on how the connector is configured. If the user does not have an open session to the repository, the web client asks for the user's login credentials before displaying the document.


How Security is Supported

The repository indexed and served by the Google Search Appliance can use any Documentum user authentication mechanism.

The Google Search Appliance and Google Search Appliance Connector for Documentum require a Superuser user name and password for access to the repository. You supply the user name and password in the Admin Console when you configure an instance of the Google Search Appliance Connector for Documentum. The connector supplies the Superuser name and password to Documentum at traversal time.

At serve time, the connector requests the user credentials of the user submitting a search request. Those user credentials are passed to the Content Server, which authenticates the user and determines which results the user is authorized to view.

The Google Search Appliance does not require special configuration to support Documentum's user authentication and authorization mechanisms.

About Public Content

Content marked public can be viewed by all users, regardless of how permissions are set in the repository. You can designate content as public content before or after it is indexed.

This means that users can view public content from the search appliance results page that they cannot view using Documentum Webtop.

  • When a user performs a search request on the search appliance, the results are not filtered according to the user's repository permissions on the content files. The search appliance results page displays the content on the Cached or Text Version pages.
  • When a user clicks a result and views the result in Webtop, any required authentication and authorization checks are performed. The content file is only served to the user if he has sufficient permissions.

To make content public:

  1. On the Admin Console, click Crawl and Index > Crawler Access.
  2. Type the following statement in the For URLs Matching Pattern field for each of your connectors, where connector_name is the name of the particular connector:

    ^googleconnector://connector_name.localhost/

  3. Type in the User Name and Password required for accessing the URLs.
  4. Confirm the Password.
  5. To make the content for a particular URL pattern public, check Make Public.
  6. To add an additional URL pattern, click Add More Rows and complete steps 5 through 8.
  7. Click Save Crawler Access Configuration.

For More Security Information

For more information on authentication and authorization with connectors, see the chapters on "Crawl, Index, and Serve," "Use Cases with Public and Secure Serve for Multiple Authentication Mechanisms," and "Cookie-Based Authentication Scenarios" in Managing Search for Controlled-Access Content.


Upgrading the Connector

Use the instructions below to upgrade the Documentum connector.


Deciding How to Upgrade the Connector

If you are running Documentum 5.2.5, 5.3, 6.0, or 6.5, including service packs, and you plan to remain on the current Documentum version, use the instructions in Upgrading the Connector without a Documentum Upgrade.

If you are running Documentum 5.x and you are upgrading to Documentum 6 or 6.5, use the instructions in Upgrading the Connector with a Documentum Upgrade.


Upgrading the Connector without a Documentum Upgrade

To upgrade to this version of the Google Search Appliance Connector for Documentum:

  1. Log in to the Apache Tomcat host as the user who installed the connector.
  2. Stop the connector service.
  3. Navigate to the $CATALINA_HOME/webapps/connector-manager/WEB-INF/connectors directory.
  4. Rename the EMC_Documentum_Content_Server_5.2.5_5.3 or EMC_Documentum_Content_Server_6.0 subdirectory to EMC_Documentum_Content_Server.
  5. Follow the instructions in Installing the Connector Using the Installer, but during the installation process, choose the connector that you are upgrading.

Upgrading the Connector with a Documentum Upgrade

To upgrade to this version of the Google Search Appliance Connector for Documentum with an upgrade from Documentum 5 to Documentum 6:

  1. Log in to the Google Search Appliance Admin Console.
  2. On the Connectors page, delete the connector instance. If more than one connector instance is associated with the connector manager you are upgrading, delete them all. If you are upgrading more than one connector manager, delete all connector instances for each connector manager.
  3. Delete the connector manager definition. If you are upgrading more than one connector manager, delete them all.
  4. Log in to the Apache Tomcat host.
  5. Stop the existing connector service.
  6. Uninstall the connector from the Apache Tomcat host.
  7. Upgrade Documentum.
  8. Install this verison of the connector using the instructions in Installing the Google Search Appliance Connector for Documentum.
  9. Log in to the Google Search Appliance Admin Console.
  10. Create a new connector manager. If you deleted more than one connector manager in step 3, create corresponding new connector managers.
  11. Create a new connector instance. If you deleted more than one connector instance in step 2, create corresponding new connector instances.

Additional Upgrade Steps for Installations with a Customized connectorInstance.xml File

If you have a customized connectorInstance.xml file, you must manually update the advanced configuration, which, in the 2.0 and later connector releases, is moved to the configuration form on the Admin Console.

To complete the upgrade process:

  1. Upgrade the connector installation using the connector installer and perform the following steps for each connector instance that has a custom connectorInstance.xml file.
  2. Open the connectorInstance.xml file in a text editor.
  3. Open the Admin Console in browser.
  4. On the Connector Administration > Connectors page, click the Edit link next to the connector instance whose connectorInstance.xml file you have open.
  5. On the Root Object Type drop-down list, select the value of the root_object_type property that is in the connectorInstance.xml file.
  6. If the value of the where_clause property in the connectorInstance.xml file is a DQL WHERE condition (rather than ${where_clause}, copy the DQL WHERE condition to the Additional Where Clause field on the Admin Console.
  7.  If the included_object_type property includes entries other than dm_document, select the appropriate object types in the Available Object Types select list and click the right arrow button.
  8. If dm_document is not one of the included entries in the XML file, select it in the Included object types select list and click the left-arrow button.
  9. Perform the same steps for the included_meta property and the Included Properties select list.
  10. If you have an object type other than dm_document in the list of included object types, and you expect additional properties from that object type to be indexed, select those properties from the Available properties select list and click the right-arrow button.

    In previous versions of the connector, all properties of other object types that we not explicitly excluded were included. In version 2.0, any properties that you want to be indexed must appear in the Included properties select list.

  11. Delete the connectorInstance.xml file from the connector instance directory.
  12. Click Save Configuration on the Admin Console.

Uninstalling Connectors and Connector Managers


Deleting a Connector Instance from the Admin Console

You delete a connector instance only on the Admin Console of the Google Search Appliance. When you delete the instance, you delete the configuration information for the instance. The connector manager no longer creates and runs the instance.

Each connector instance is listed on the Admin Console in the Connector Administration > Connectors section. The indicator light is either green or red. Green indicates the existence of the connector instance.

To delete a connector instance:

  1. Log in to the Admin Console as an administrator.
  2. Click Connector Administration > Connectors.
  3. Click the Edit link for the correct connector.
  4. Check the Disable Traversal checkbox for the connector you are deleting.
  5. Click Save Configuration.
  6. On the Connector Administration > Connectors page, locate the connector instance you want to delete.
  7. Click the Delete link on the line for the correct connector instance.
  8. Click OK.

Deleting a Connector Manager

To delete a connector manager, you must first unregister the connector manager from the Admin Console, then uninstall the connector manager on the Tomcat host.

Before you unregister a connector manager, you must delete all connector instances associate with that connector manager. If you have a large number of connector instances, you can first stop the Tomcat instance where the connector manager is running, then unregister the connector manager.

It is also possible to uninstall the connector manager on the Tomcat host, then unregister the connector manager on the Admin Console.

Unregistering a Connector Manager from the Admin Console

To unregister a connector manager from the Admin Console:

  1. Log in to the Admin Console as an administrator.
  2. Click Connector Administration > Connector Managers.
  3. Locate the connector manager you want to delete.
  4. Click the Unregister link on the line for the correct connector manager.
  5. Click OK.
Uninstalling a Connector Manager

To uninstall a connector manager from the Tomcat host, do one of the the following:

  • On Windows, click Start > All Programs > Google Search Appliance Connector version_number > Uninstall
  • On Linux, click the appropriate shortcut.

To manually delete a connector manager on the Apache Tomcat host:

  1. Log in to the Apache Tomcat host as the installation owner (the user who installed Tomcat).
  2. Shut down Tomcat.
  3. Navigate to the $CATALINA_HOME/webapps directory.
  4. Delete the connector-manager.war file.
  5. Delete the $CATALINA_HOME/webapps/connector-manager directory.
  6. Restart Tomcat.

Troubleshooting the Google Search Appliance Connector for Documentum

If you have a problem that requires you to file a ticket with Google Cloud Support, be prepared to provide Support with the following information:

  • Verbose connector logs. See Logging for information on changing the default logging level. If you are reporting a problem to Support, it is ideal if you can reproduce the problem with the logging level set to ALL. However, log files with entries made when the problem occurred are also helpful.
  • Connector configuration files.
  • Feed record and metadata log file. See Logging Feed Record and Metadata Information to a Text File for information on generating this log file.

Diagnosing Connector Problems

If you create a connector instance and no search results are returned, use the following checklist to help diagnose the problem.

Problem How to Diagnose
The connector has not traversed any documents. View the Admin Console Feeds page or Crawl Diagnostics page to confirm. View the connector logs to help determine the specific reason.
The search appliance has not accepted the feed. View the Admin Console Feeds page to determine whether the search appliance is accepting feeds.
The connector has not traversed the designated test documents. View the Admin Console Crawl Diagnostics page. Examine the connector logs and look for the end of a traversal or for errors associated with specific documents. Lastly, enable the teedFeedFile and reset the traversal.
The search appliance has not indexed the documents.

This can be difficult to determine, but the Crawl Diagnostics page tells you which content files have not been indexed. Usually, you must wait until the content is indexed. This failure is more common with metadata-and-URL feed connectors.

With content feed connectors, a document can appear on the Crawl Diagnostics pages almost immediately, sometimes before the feed appears on the Feeds page. However, the document does not appear in search results for another 5 to 15 minutes. If a document does not appear on Crawl diagnostics, it has not been indexed and probably has not been traversed.

The Documentum connector is slow to index content, but is sending feeds. The connector performs the following three main actions:

1. A query to find documents to add (including updates)

2. A query to find documents to delete

3. The retrieval and feeding of the documents

Usually when the batches are slow the problem is the query performance. Turn the logging level up to FINE to verify that the query execution is slow.

To improve query performance, there are recommended database indexes.

Secure documents were not included in test searches. Ensure that a secure search was performed.
There were authentication failures. Depending on the search appliance version, examine the Security Manager log or the connector logs.
There were authorization failures. Examine the authorization log on the search appliance Access Control page or the connector logs. For metadata-and-URL feeds or policy ACLs, this is where you will find the information you need. For connector authorization, the connector log has more details about failures than the search appliance authorization log.

When you examine the connector logs, error messages labeled SEVERE or Exception are good starting points. For authorization issues, search the logs for the user name of the users who experienced authorization failures.


Logging

Logging is a useful technique for recording information about how your installation is operating. You can use the information logged for troubleshooting the operations of the connector, the Google Search Appliance, and Documentum.

The connector manager and connectors use the java.util.logging package for logging. The installer installs a logging mechanism for the connector and starts the logging process automatically. The default logging configuration is defined in the logging.properties file.

To customize the configuration, navigate to
connectors_root_dir/connector_name/Tomcat/webapps/connector-manager/WEB-INF/classes and edit the logging.properties file there.

The following line in the file sets the default logging level for the Documentum connector:

.level=INFO

The default logging level for most packages and output destinations (handlers) is INFO. To enable debugging at a finer level of granularity, you can change the default connector manager logging level to ALL or FINER. For example, you might change the logging level as follows:

.level = ALL

The possible values of the level property are OFF, SEVERE, WARNING, INFO, CONFIG, FINE, FINER, FINEST, and ALL. The default level is INFO.

Starting with GSA version 6.14 when using connector manager 2.8.x the logging level can be adjusted via the Administration Console - however this change affects only the currently running process and will be reverted back to default upon restaring the connector manager.

The output from the FileHandler appears in the connectors_root_dir/connector_name/Tomcat/logs directory. The output appears in the google-connectors.sequence.log file, where sequence is a series of numbers starting with 0 and incremented by 1 on each occurrence (0, 1, 2, 3...n). The first three log file names would be google-connectors.0.log, google-connectors.1.log, and google-connectors.2.log.

After editing the logging.properties file, restart Tomcat.

In addition, enable logging for the content management system's native API on the Apache Tomcat host and, if relevant, on the repository server host.


Error Messages

This section describes some commonly encountered error messages and their likely solutions.

Search Appliance Unable to Connect to the Connector Manager

If the Apache Tomcat instance where the connector manager is installed is not started or if the location you type in is incorrect or invalid, a message is displayed on the Connector Manager Administration page of the Admin Console saying "The appliance could not connect to the connector manager as specified in the location. Make sure that the URL is correct, or try again later."

Admin Console Error

If the connector is unable to connect to Documentum, ensure that the login and password are valid for the repository, and ensure that the values in the dmcl.ini or dfc.properties file, depending on the Documentum version, are correct.

HTTP 404 Error When Registering a Connector Manager

When you are registering a new connector manager, you might see the following error message:

The HTTP response failed with the following code: 404. No external connector managers registered.

This means that the CATALINA_HOME environment variable is not set correctly on the Tomcat host. Examine the Tomcat startup script or .bashrc and ensure that CATALINA_HOME points to the correct Tomcat installation.

HTTP 401 Error When Configuring a Connector

When creating the connector, GSA admin may get the following error:

Cannot connect to the given SharePoint Site URL with the supplied Domain/Username/Password. Reason:(401) Unauthorized

  1. Check that the username and password are correct. Configure the crawler access under Crawl and Index > Crawler Access and perform a manual fetch under Status and Reports > Real-time Diagnostics in the Admin Console to verify connectivity and validate the credentials. If you get a 401, then please confirm the username and password again. If you get a http status of 200, check logs for information below.
  2. Check the connector log. If you see the following error, please check that the user has contribute access.
Aug 23, 2011 11:18:56 AM com.google.enterprise.connector.sharepoint.wsclient.WebsWS checkConnectivity
WARNING: Unable to connect.
AxisFault
faultCode: {http://xml.apache.org/axis/}HTTP
faultSubcode:
faultString: (401)Unauthorized
faultActor:
faultNode:
faultDetail:
{}:return code: 401
401 UNAUTHORIZED
{http://xml.apache.org/axis/}HttpErrorCode:401
Feed Exception During Traversal

You might see the following error message if you installed a connector manually or you are using a connector manager earlier than version 2.0:

SEVERE: Feed Exception during traversal.
com.google.enterprise.connector.pusher.FeedException: Connection refused: connect

This happens when the connector service is reinstalled, whether or not it is the same version, to a new location, but it is not reregistered on the Admin Console. The connector service points at localhost by default, rather than pointing to the search appliance. In this situation, the connectors are unable to feed documents to the search appliance.

To fix this issue:

  1. Log in to the Admin Console and navigate to the Connector Managers page.
  2. Click the Edit link for your connector manager.
  3. Click the Save button.

Alternatively, you can manually edit the applicationContext.properties file in the Tomcat/webapps/connector-manager/WEB-INF directory by changing localhost to the IP address of the GSA in the following line:

gsa.feed.host=localhost

If you manually edit the file, you must restart Tomcat after you save your changes.

Error Message When Trying to Add a Connector to an Unavailable Connector Manager

When a connector manager is unavailable, the Admin Console displays a circular red indicator next to the connector manager name. If you try to add a connector to an unavailable connector manager, you see the following error message:

The appliance encountered an error while trying to make the following servlet call: getConnectorList

The connector manager might be unavailable for one of the following reasons:

  • Tomcat is not running on the registered host and port
  • The connector manager host is unreachable
  • The Tomcat Remote Address Filter is rejecting access

Check each condition and correct any problems.


Logging Feed Record and Metadata Information to a Text File

You can log all URLs and metadata fed to a Google Search Appliance without recording all content. There are two ways to implement this logging technique.


Using the feedLoggingLevel Property

To use the feedLoggingLevel property to log URLs and metadata:

  1. Log on to the Apache Tomcat host with the user account under which Tomcat runs.
  2. Shut down the Tomcat instance that hosts the connector manager.
  3. Navigate to the webapps/connector-manager/WEB-INF/ directory.
  4. Open the applicationContext.properties file in a text editor.
  5. Set the feedLoggingLevel property to the value ALL:

    feedLoggingLevel=ALL

  6. Save the applicationContext.properties file.
  7. Restart Tomcat.

    The logging information is recorded in the $CATALINA_BASE/logs/google-connectors.feed%g.log files, where %g is a generation number used to distinguish among rotated logs.


Using a logging.properties Configuration File

To use a logging.properties configuration file to log URLs and metadata:

  1. Log on to the Apache Tomcat host with the user account under which Tomcat runs.
  2. Shut down the Tomcat instance that hosts the connector manager.
  3. Navigate to the logging.properties file.
    • If you installed the connector using the installer, the file is in the connector_directory/Tomcat/webapps/connector-manager/WEB-INF/classes/ directory.
    • If you installed the connector manually, navigate to the location where you created a logging.properties file. The logging.properties file is probably in the If not, copy the logging.properties file from the $JAVA_HOME/lib/ directory to the $CATALINA_HOME/webapps/connector-manager/WEB-INF/classes directory. You might have to create the /classes directory manually.
  4. Open the logging.properties file in a text editor.
  5. Add the following line to the file:

    com.google.enterprise.connector.pusher.DocPusher.FEED_WRAPPER.FEED.level=FINER

  6. Save the logging.properties file.
  7. Restart Tomcat.

    The logging information is recorded in connector_directory/Tomcat/logs/google-connectors.feed%g.log, where %g is a generation number used to distinguish among rotated logs.


Related Documentation

For more information on the connector manager, see Introducing Connectors. For release notes, see the connector open-source project site.

For complete information on the Documentum Content Server, see EMC's documentation.

Was this helpful?
How can we improve it?