Locale-aware crawling by Googlebot
This article describes how Google uses different crawl settings for sites that cannot have separate URLs for each locale.
If your website has pages that return different content based on the perceived country or preferred language of the visitor (i.e., you have locale-adaptive pages), Google might not crawl, index, or rank all of your locale-adaptive content. This is because the default IP addresses of the Googlebot crawler appear to be based in the USA. In addition, the crawler sends HTTP requests without setting
Accept-Language in the request header.
In order to address crawling and indexing of locale-adaptive content, we use locale-aware crawling to better surface your content for searchers around the world. Locale-aware crawling occurs when Googlebot crawls with one or both of the following configurations:
- Geo-distributed crawling: Googlebot appears to be using IP addresses based outside the USA, in addition to the longstanding IP addresses Googlebot uses that appear to be based in the USA.
- Language-dependent crawling: Googlebot crawls with an Accept-Language field set in the HTTP header.
Currently, Googlebot recognizes a number of signals and hints to determine if your website serves locale-specific content:
- Serving different content on the same URL—based on the user’s perceived country (geolocation)
- Serving different content on the same URL—based on the Accept-Language field set by the user’s browser in the HTTP request header
- Completely blocking access to requests from specific countries
As we have always recommended, when Googlebot appears to come from a certain country, treat it like you would treat any other user from that country. This means that if you block USA-based users from accessing your content, but allow visitors from Australia to see it, your server should block a Googlebot that appears to be coming from the USA, but allow access to a Googlebot that appears to come from Australia.
Googlebot uses well-established IP addresses that appear to come from the United States. With geo-distributed crawling, Googlebot can now use IP addresses that appear to come from other countries, such as Australia.
If your site alters its content based on any
Accept-Language field set by browsers’ HTTP headers, Googlebot uses a variety of signals to try to crawl your content using different
Accept-Language HTTP headers. This means Google is more likely to discover, index, and rank your content in the different languages your site supports.
- Googlebot uses the same user-agent string for all crawling configurations. Learn more about the user-agent strings used by Google crawlers in our Help Center.
- You can verify Googlebot geo-distributed crawls using reverse DNS lookups.
- Make sure your site applies the robots exclusion protocol consistently across locales. This means that robots
metatags and the
robots.txtfile should specify the same directives in each locale. For example, if Googlebot receives a
noindexmeta tag when setting an
Accept-Languageheader in Spanish, it should receive the same
noindexmeta tag with no
Accept-Languageheader or when Googlebot crawls with a different
Accept-Languageheader. This will avoid unexpected crawling and indexing behavior that can happen if different locales have different
noindexmeta tags, or your site responds with different
robots.txtfiles to different IP addresses. (Learn more in Controlling Crawling and Indexing in our Developer site.)