In some cases you may need to start multiple Connector 4 instances on one server (e.g., several instances of the SharePoint connector to crawl different site collections). Administrators may want to configure the GSA to mitigate excessive load on content servers and prioritize crawling for one of the connectors.
The goal is to configure host load so the content is actively crawled by one or a set of the connectors.
The GSA uses the host load feature to control load on content servers that need to be configured in a specific way to balance the load.
The idea is to split the host load between connectors and add the sum of the host load for the IP address of the connector host server.
You have three connectors to crawl SharePoint, Filesystem, and Database content from one connector host. Connectors are configured to run on different ports:
You want to prioritize SharePoint connector crawling.
Let's assume that connector host name resolves to
18.104.22.168 and you want to allow four concurrent connections on the SharePoint connector, but limit the Filesystem and Database connectors to only to one concurrent connection.
To configure the GSA as described, perform the following steps in the Admin Console:
- Open the Content Sources > Web Crawl > Host Load Schedule page
- Set host load 4.0 for SharePoint
- Set host load 1.0 for Filesystem
- Set host load 1.0 for Database
- Set host load 6.0 (4 + 1 + 1) for
In general, the IP host load should be the SUM of all individually configured host loads.
Screenshot from Host Load page in Admin Console:
Additional Details: For more details about configuring host load, see About the Load on the Web Server Host