Common product crawl issues

Google routinely crawls your mobile and desktop product pages and images to check for quality issues. If we're unable to perform these crawls, we won't be able show your items in Shopping ads.

Additionally, for landing pages, if we detect crawling errors while fetching either the mobile or desktop landing page of an item, Google will disapprove the item for both mobile and desktop devices until we are able to access the landing page successfully.

The most common reasons for product crawl issues are:

  • Page not found (404) error: You gave us a wrong URL (e.g there was a mistake in the URL) and so the page returned a ‘Page not found (404)’ error. Please check that the URL is correct and your website is live.
  • Server's robots.txt disallows access: You've added a ‘robots.txt’ file to your server and prohibited crawl access. We are unable to crawl pages with these type of files and prohibitions. Resolve this by configuring the ‘robots.txt’ file to allow our crawl.
  • Invalid URL: Your URL contains invalid characters or does not have the format of a valid link.

Note: Once you've resolved the issue, it may take up to 48 hours for your product to reappear in Shopping ads.

There are a number of other issues that may also prevent Google from crawling your page.

 

Other common issues
  • Page requires authentication: The URL provided is protected by some sort of authentication protocol that prevents Google from accessing the content.
  • HTTP 4xx response; HTTP 5xx response: The server hosting your website returned an HTTP error that prevented us from accessing the content.
  • Hostname not resolveable: We were unable to resolve the hostname of your server to an IP address and so could not access the page.
  • Malformed HTTP response: The response from your server was garbled.
  • Private IP: Your website is hosted behind a firewall or router and we were unable to access it.
  • Network error: There was some sort of error in the network.
  • Timeout reading page: The server took too long returning the page and we abandoned the crawl of that product.
  • Server redirects too often: Your server redirected the crawl multiple times and it had to be abandoned.
  • Redirect URL too long; empty redirect URL; bad redirect URL: The redirect URL your server returned was not valid and we could not follow it.
  • Server's robots.txt unreachable; timeouts reading robots.txt: We were unable to read your robots.txt file so could not crawl your page. Learn more about the Robot Exclusion Protocol here.
Was this article helpful?
How can we improve it?