Googlebot encountered an extremely high number of URLs from your site. This could cause Googlebot to unnecessarily crawl a large number of distinct URLs that point to identical or similar content, or to crawl undesired parts of your site. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.
- Problematic parameters in the URL.. Session IDs, or sorting methods, for example, can create massive amounts of duplication and a greater number of URLs. Similarly, a dynamically generated calendar might generate links to future and previous dates with no restrictions on start or end dates.
- Additive filtering of a set of items. Many sites provide different views of the same set of items or search results. Combining filters (for example, show me hotels that are on the beach, are dog-friendly AND have a fitness center), can result in a huge number of mostly redundant URLs.
- Dynamic generation of documents as a result of counters, timestamps, or advertisements.
- Broken relative links. Broken relative links can often cause infinite spaces. Frequently, this problem arises because of repeated path elements. For example:
http://www.example.com/index.shtml/discuss/category/school/061121/ html/interview/category/health/070223/html/category/business/070302 /html/category/community/070413/html/FAQ.htm
To avoid potential problems with URL structure, we recommend the following:
- Whenever possible, shorten URLs by trimming unnecessary parameters. Use the Parameter Handling tool to indicate which URL parameters Google can safely ignore. Make sure to use these cleaner URLs for all internal links. Consider redirecting unnecessarily long URLs to their cleaner versions or using the rel="canonical" link element to specify the preferred, shorter canonical URL.
- Wherever possible, avoid the use of session IDs in URLs. Consider using cookies instead. Check our URL guidelines for additional information.
- If your site has an infinite calendar, add a nofollow attribute to links to dynamically created future calendar pages.
- Check your site for broken relative links.
- If none of the above is possible, consider using a robots.txt file to block Googlebot's access to problematic URLs. Typically, you should consider blocking dynamic URLs, such as URLs that generate search results, or URLs that can create infinite spaces, such as calendars. Using wildcards in your robots.txt file can allow you to easily block large numbers of URLs.