Deployment Scenario Handbook

Indexing through Feeds

Scenario overview


Acme Inc. has its own retail stores, which are run under two different brands. In the use case for this scenario, they want to make the products database searchable by store employees. The products are currently stored in a database, with certain data contained in business applications. The crawler cannot index the product pages directly. There is a web front end that displays product information when supplied with the product number in the URL.

Requirements


  • Index content about products.
  • Provide search within specific retailer brands.
  • Enable left-pane parametric navigation by:
    • Price
    • Category
    • Arrival time

Assumptions


  • There is a web front end in place to display product pages. This web front end doesn’t contain all the metadata required for the indexing of products.
  • Products are not secured and anyone can view them.

Key considerations

  • Decide whether to use the Google Search Appliance Connector for Databases to onboard the product records onto the GSA.
  • Decide whether to use a content feed or a web feed for onboarding the product records onto the GSA.
  • Define metadata for indexing along with the content to drive Dynamic Navigation and advanced search capabilities.

Recommended approach


Google’s recommended approach for indexing through feeds covers the following areas:

Deployment architecture

Because records along with required metadata cannot be constructed using only database queries, the database connector will not be used to index content. Instead, Acme Inc. will use a custom feed. A custom feed is an application that constructs XML containing records to index on the GSA. The main step in getting the XML with records into the GSA is a POST action on the feeds protocol interface on the GSA.

In addition to the GSA, another server (Windows or Linux) is required to host the feeds application. This application will perform some logic to construct a record and post records into the GSA.

Crawl and index configuration

The recommended approach is to use content feeds to onboard product records into the GSA. In this way, content can be custom tailored for indexing. This method also takes advantage of the GSA capability for caching a custom product page in its index. In this way, store associates can choose to display the cached version, as perhaps it is easier to display than the product web front end in place.

The feeds application has to be designed so that for every product row in the database, it can construct the required HTML and associated metadata to feed into the GSA. A mechanism is needed that will keep track of all deleted, modified, and added records.

Metadata focus

A focus on metadata when dealing with product-style content is extremely important. Metadata helps end users perform more powerful advanced searches and also helps them “drill down” into different defined categories. Acme Inc. can identify certain metadata to drive dynamic navigation headings so users can drill down into different categories of metadata values with the click of a mouse. Other metadata values can be used in queries to restrict the query over a specific set of content.

Front end configuration

One advantage of using content feeds to bring content into the GSAs index is that a cached version of the custom content created for the feed will be saved on the GSA and can be selected to be displayed on the front end.

One example of a beneficial use is the indexing of printer-friendly pages that can be printed and given out as datasheets. When a user wants to see a quick facts page, she can reference the cached content on the GSA. When she wants a more in-depth view, she can click the link and be taken to the web front end that displays the detailed product page for the particular item. Acme Inc. must modify the GSA XSLT to display the products and associated metadata accordingly.

Acme Inc. will also modify the front end to enable advanced search features for users, based on collections and defined metadata. Perhaps selecting a drop down or clicking a radio button will attach metadata as query terms to the associated query in order to scope it over the products corpus accordingly.

Alternative approaches


  • If all content and metadata can be derived from database queries, use the database connector for feeding in all products.
  • If the front end application for displaying products can display all required information needed for indexing, use a web feed for indexing all products.
  • Instead of defining the process to feed in metadata at indexing time, you can alternatively configure Entity Recognition rules through dictionaries or XML regular expression definitions to automatically tag the documents with entities at indexing time.

Project task overview


The following table lists the project tasks and activities for implementing indexing through feeds.

Task Activities
Plan deployment architecture
  • Provision server to host feed application
Configure crawl and index
  • Configure collections identified for brands
  • Design logic for constructing feed content
  • Design feeds application for writing out XML records and posting them to the GSA
Configure front end
  • Modify XSLT to display records along with desired metadata
  • Create an advanced search page or page section that will scope queries restricted to desired metadata
Was this helpful?
How can we improve it?