Feed processing is taking too long

Summary: Feed processing is taking too long.  You see a large backlog on the Admin Console Crawl and Index > Feeds page. 

Cause: The time it takes the search appliance to process feeds depends on many factors, including:

  • General load on the search appliance (crawling and serving)
  • Number of records in the feed file
  • Document types of the records in the feed file
  • Whether or not multiple data-sources are used
  • Hardware model of the GSA (newer models are typically faster)

In general, it is very hard to have accurate estimates. Processing of feeds containing documents that require conversion (e.g. non-HTML documents such as Microsoft Office documents and PDFs) takes longer.

Even when submitting a single-record feed to a search appliance under zero load (no serving, no crawling, and no feed backlog), it can take up to a few minutes for the feed to be processed.

The search appliance provides graphs on the number of submitted feed files, number of processed records and number of feeds in the backlog. The graphs are available in the Admin Console on the Crawl and Index > Feeds page. An increasing backlog might indicate that the search appliance is not keeping up with the feed submissions.

Workaround: Use multiple data-sources. The search appliance will parallelize feed processing across multiple data-sources.

For performance reasons, it is also better to have fewer feeds with more records than fewer records per feed. Submitting single-record feeds once in a while will slow down the overall prcoessing speed.

Was this helpful?
How can we improve it?