Summary: Some documents which do not contain any of my search terms are being returned, sometimes even at the top of my search results.
Cause: The query term exists in:
- Anchor text (href link text): In this situation you may find you have many documents linking to a particular page with anchor text containing your query terms. Anchor text gives a page a high relevancy regardless of whether the same text exists in the content of the document.
- Metadata: If you have metadata in your pages either via metadata tags or external metadata this will also be indexed, even if your query terms do not exist in the content these documents will still be returned.
To determine if your anchor text is the cause:
- Navigate to crawl diagnostics on the Admin Console, under Google Search Appliance > Status And Reports > Crawl Diagnostics.
- Enter the URL in the URLs starting with field.
- Click on the View list of all crawled pages that link to this page.
- Look through these results to find the link and its anchor text which caused the issue.
To determine if metadata is causing the issue, view search results as XML with the metadata. To do this, rerun the search query but make these changes to the search URL:
- Remove the "proxystylesheet” paramter and its argument.
- Add the &getfields=* parameter.
This will show you the metadata associated with the URL.
Fix: Anchor text relevancy is non configurable and as such there is no way to "fix" the issue as it is intended behavior. A better option is to note if the anchor text is actually relevant to the content that it is linking to and alter your content/anchor accordingly.
Additional Details: See the XML reference for information on modifying the search URL parameters.