Deployment Scenario Handbook
Indexing through Feeds
- Scenario overview
- Key considerations
- Recommended approach
- Alternative approach
- Project task overview
Acme Inc. has its own retail stores, which are run under two different brands. In the use case for this scenario, they want to make the products database searchable by store employees. The products are currently stored in a database, with certain data contained in business applications. The crawler cannot index the product pages directly. There is a web front end that displays product information when supplied with the product number in the URL.
- Index content about products.
- Provide search within specific retailer brands.
- Enable left-pane parametric navigation by:
- Arrival time
- There is a web front end in place to display product pages. This web front end doesn’t contain all the metadata required for the indexing of products.
- Products are not secured and anyone can view them.
- Decide whether to use the Google Search Appliance Connector for Databases to onboard the product records onto the GSA.
- Decide whether to use a content feed or a web feed for onboarding the product records onto the GSA.
- Define metadata for indexing along with the content to drive Dynamic Navigation and advanced search capabilities.
Google’s recommended approach for indexing through feeds covers the following areas:
Because records along with required metadata cannot be constructed using only database queries, the database connector will not be used to index content. Instead, Acme Inc. will use a custom feed. A custom feed is an application that constructs XML containing records to index on the GSA. The main step in getting the XML with records into the GSA is a POST action on the feeds protocol interface on the GSA.
In addition to the GSA, another server (Windows or Linux) is required to host the feeds application. This application will perform some logic to construct a record and post records into the GSA.
The recommended approach is to use content feeds to onboard product records into the GSA. In this way, content can be custom tailored for indexing. This method also takes advantage of the GSA capability for caching a custom product page in its index. In this way, store associates can choose to display the cached version, as perhaps it is easier to display than the product web front end in place.
The feeds application has to be designed so that for every product row in the database, it can construct the required HTML and associated metadata to feed into the GSA. A mechanism is needed that will keep track of all deleted, modified, and added records.
A focus on metadata when dealing with product-style content is extremely important. Metadata helps end users perform more powerful advanced searches and also helps them “drill down” into different defined categories. Acme Inc. can identify certain metadata to drive dynamic navigation headings so users can drill down into different categories of metadata values with the click of a mouse. Other metadata values can be used in queries to restrict the query over a specific set of content.
One advantage of using content feeds to bring content into the GSAs index is that a cached version of the custom content created for the feed will be saved on the GSA and can be selected to be displayed on the front end.
One example of a beneficial use is the indexing of printer-friendly pages that can be printed and given out as datasheets. When a user wants to see a quick facts page, she can reference the cached content on the GSA. When she wants a more in-depth view, she can click the link and be taken to the web front end that displays the detailed product page for the particular item. Acme Inc. must modify the GSA XSLT to display the products and associated metadata accordingly.
Acme Inc. will also modify the front end to enable advanced search features for users, based on collections and defined metadata. Perhaps selecting a drop down or clicking a radio button will attach metadata as query terms to the associated query in order to scope it over the products corpus accordingly.
- If all content and metadata can be derived from database queries, use the database connector for feeding in all products.
- If the front end application for displaying products can display all required information needed for indexing, use a web feed for indexing all products.
- Instead of defining the process to feed in metadata at indexing time, you can alternatively configure Entity Recognition rules through dictionaries or XML regular expression definitions to automatically tag the documents with entities at indexing time.
The following table lists the project tasks and activities for implementing indexing through feeds.
|Plan deployment architecture||
|Configure crawl and index||
|Configure front end||