Locale-aware crawling by Googlebot

This article describes how Google uses different crawl settings for sites that cannot have separate URLs for each locale.

IMPORTANT: We continue to support and recommend using separate locale URL configurations and annotating them with rel=alternate hreflang annotations.

If your website has pages that return different content based on the perceived country or preferred language of the visitor (i.e., you have locale-adaptive pages), Google might not crawl, index, or rank all of your locale-adaptive content. This is because the default IP addresses of the Googlebot crawler appear to be based in the USA. In addition, the crawler sends HTTP requests without setting Accept-Language in the request header.

In order to address crawling and indexing of locale-adaptive content, we use locale-aware crawling to better surface your content for searchers around the world. Locale-aware crawling occurs when Googlebot crawls with one or both of the following configurations:

  • Geo-distributed crawling: Googlebot appears to be using IP addresses based outside the USA, in addition to the longstanding IP addresses Googlebot uses that appear to be based in the USA.
  • Language-dependent crawling: Googlebot crawls with an Accept-Language field set in the HTTP header.
Confirm that your website configuration supports locale-aware crawling

Currently, Googlebot recognizes a number of signals and hints to determine if your website serves locale-specific content:

  • Serving different content on the same URL—based on the user’s perceived country (geolocation)
  • Serving different content on the same URL—based on the Accept-Language field set by the user’s browser in the HTTP request header
  • Completely blocking access to requests from specific countries

Geo-distributed crawling

As we have always recommended, when Googlebot appears to come from a certain country, treat it like you would treat any other user from that country. This means that if you block USA-based users from accessing your content, but allow visitors from Australia to see it, your server should block a Googlebot that appears to be coming from the USA, but allow access to a Googlebot that appears to come from Australia.

Googlebot uses well-established IP addresses that appear to come from the United States. With geo-distributed crawling, Googlebot can now use IP addresses that appear to come from other countries, such as Australia.

Note: This list is not complete and likely to change over time.

Language-dependent crawling

If your site alters its content based on any Accept-Language field set by browsers’ HTTP headers, Googlebot uses a variety of signals to try to crawl your content using different Accept-Language HTTP headers. This means Google is more likely to discover, index, and rank your content in the different languages your site supports.

Other considerations

  • Googlebot uses the same user-agent string for all crawling configurations. Learn more about the user-agent strings used by Google crawlers in our Help Center.
  • You can verify Googlebot geo-distributed crawls using reverse DNS lookups.
  • Make sure your site applies the robots exclusion protocol consistently across locales. This means that robots meta tags and the robots.txt file should specify the same directives in each locale. For example, if Googlebot receives a noindex meta tag when setting an Accept-Language header in Spanish, it should receive the same noindex meta tag with no Accept-Language header or when Googlebot crawls with a different Accept-Language header. This will avoid unexpected crawling and indexing behavior that can happen if different locales have different noindex meta tags, or your site responds with different robots.txt files to different IP addresses. (Learn more in Controlling Crawling and Indexing in our Developer site.)
Was this article helpful?
How can we improve it?