Common Product Crawl Issues

We routinely crawl your mobile and desktop product pages and images in order to check for quality issues. If we cannot do this, we will be unable to show your items on Google Shopping. Additionally for landing pages, if we detect crawling errors while fetching either the mobile or desktop landing page of an item, we will disapprove the item for both mobile and desktop devices until we are able to access the landing page successfully.

The most common reasons for Product Crawl Issues are:

  • Page not found (404) error: You gave us a wrong URL (e.g. there was a mistake in the URL) and so the page returned a ‘Page not found (404)’ error. Please check that the URL is correct and that your website is live.
  • Server's robots.txt disallows access: You have roboted your page by adding a ‘robots.txt’ file to your server and prohibited crawl access. We do not crawl roboted pages. Please resolve this by configuring the ‘robots.txt’ file to allow our crawl.
  • Invalid URL: Your URL contains invalid characters or does not have the format of a valid link.

Note: Once the issue that you are experiencing has been resolved, your product may take up to 48 hours to be reinserted into Google Shopping.

There are a number of other issues that may also prevent us from crawling your page.

Common Issues
  • Page requires authentication: The URL provided is protected by some sort of authentication protocol that prevents Google from accessing the content.
  • HTTP 4xx response, HTTP 5xx response: The server hosting your website returned an HTTP error that prevented us from accessing the content.
  • Hostname not resolveable: We were unable to resolve the hostname of your server to an IP address and so could not access the page.
  • Malformed HTTP response: The response from your server was garbled.
  • Private IP: Your website is hosted behind a firewall or router and we were unable to access it.
  • Network error: There was some sort of error in the network.
  • Timeout reading page: The server took too long returning the page and we abandoned the crawl of that product.
  • Server redirects too often: Your server redirected the crawl multiple times and it had to be abandoned.
  • Redirect URL too long, Empty redirect URL, Bad redirect URL: The redirect URL that your server returned was not valid and we could not follow it.
  • Server's robots.txt unreachable, Timeouts reading robots.txt: We were unable to read your robots.txt file so could not crawl your page. Learn more about the Robot Exclusion Protocol here.
Was this helpful?
How can we improve it?

Need more help?

Sign in for additional support options to quickly solve your issue