Cross-domain URL selection
A piece of content can often be reached via several URLs, not all of which are on the same domain. If you’ve noticed that you’re no longer receiving traffic to a URL from Google search results, Google’s algorithms may have selected an different URL (which may or may not be on the same domain) to index and display in search results. In most cases cross-domain URL selection reflects the intention of the webmaster, but in some cases it may not be clear why our algorithms have selected a URL on a different domain, or how you can indicate to Google’s algorithms your preferred URL if you think this selection is incorrect.
This article explains the following:
- How Google picks a representative URL from a collection of URLs displaying duplicate content
- Common causes of unexpected cross-domain URL selection and how you can indicate to Google your preferred URL
- Messages sent in Search Console about cross-domain URL selection
How Google picks a representative URL
As part of our web crawling process, Google uses algorithms to select one representative URL from a set of duplicate URLs based on many signals such as any detected duplicate content, 301 redirects, the presence of rel="canonical" HTML elements, and others. Webmasters have great influence over these signals, for example by following our advice about minimizing duplicate content and using canonicalization techniques correctly.
As described below, we may send a message via Search Console to explain that a cross-domain URL selection has happened. If you receive these messages in Search Console when you are moving your website, or have recently moved your website, you can take that as confirmation that our algorithms have noticed the move.
Causes of unexpected cross-domain URL selections
There are several potential causes of unexpected cross-domain URL selection. These include the following:
Sometimes webmasters use substantially similar content on multiple domains. When the content is being served to the same locale, then you can use Canonicalization, specifically rel=”canonical” elements and 301 redirects to help Google know which pages to index and serve.
When you have multiple websites that serve substantially the same content localized to different users around the world, be sure you help our algorithms understand your configuration using rel-alternate-hreflang annotations. You can learn more at:
- Multi-regional and multilingual sites
- Working with multi-regional websites
- About rel="alternate" hreflang="x"
Problems with incorrectly configured websites can lead our algorithms to make an incorrect decision. For example:
Some content management systems (CMS) or CMS plugins can make incorrect use of canonicalization techniques to point to URLs on external websites. Check your content to see if this is the case. If your site is indicating an unexpected canonical URL preference, perhaps through incorrect use of
rel="canonical" or a 301 redirect, fix that issue directly.
Some hosting misconfigurations may cause unexpected cross-domain URL selection. For example:
- A server may be misconfigured to return content from a.com in response to a request for a URL on b.com
- Two unrelated web servers may return identical soft 404 pages that Google fails to identify as error pages.
In both these situations, Google’s algorithms may assume that the same content is being returned from different sites, and may incorrectly select the URL from [site].com instead of a URL on [othersite].com.If this is the case, you’ll need to investigate which part of your website’s serving infrastructure is misconfigured. It may be that your server is returning HTTP 200 (success) status codes for error pages, or it might be that your server is confusing requests across different domains hosted on it. Once you find the root cause of the issue, work with your server admins to correct the configuration.
Some attacks on websites introduce code that returns an HTTP 301 redirect or inserts a cross-domain rel=”canonical” link element into the HTML
<head> or HTTP header, usually pointing to a URL hosting malicious or spammy content. In these cases our algorithms may select the malicious or spammy URL instead of the URL on the compromised website.
In this situation, please follow our guidance on cleaning your site and submit a reconsideration request when done. To identify cloaked attacks, you can use the Fetch as Google function in Search Console to see your page’s content as Googlebot sees it.
In rare situations, our algorithm may select a URL from an external site that is hosting your content without your permission. If you believe that another site is duplicating your content in violation of copyright law, you may contact the site’s host to request removal. In addition, you can request that Google remove the infringing page from our search results by filing a request under the Digital Millennium Copyright Act.
Messages in Search Console
To improve the transparency about cross-domain URL selections, we may send a message via Search Console to explain that a cross-domain URL selection has happened.
Search Console monitors a site’s top pages and when it detects that a cross-domain URL selection affecting one of them has occurred it may send the webmaster a message. However, in order to avoid overwhelming the webmaster when many URLs are affected by cross-domain URL selections (which can occur during, for example, a site move), we may not send an email for each URL and we may not send a message for the same URL multiple times.
These message will be sent to webmasters who have verified ownership of the website in Google Search Console. To receive the message promptly, we recommend activating email delivery of Search Console messages.