Google’s goal is to crawl your site as efficiently as possible. Crawling and indexing pages with identical content is an inefficient use of our resources. It can limit the number of pages we can crawl on your site, and duplicate content in our index can hinder your pages' performance in our search results. Duplicate content often occurs when sites make the same content available via different URLs—for example, by using session IDs or other parameters, like this:
http://www.example.com/products/women/dresses/green.htm http://www.example.com/products/women?category=dresses&color=green http://example.com/shop/index.php?product_id=32&highlight=green+dress&cat_id=1&sessionid=123&affid=431
In this case, all these URLs point to the same content: a collection of real green dresses.
When Google detects duplicate content, such as variations caused by URL parameters, we group the duplicate URLs into one cluster and select what we think is the "best" URL to represent the cluster in search results. We then consolidate properties of the URLs in the cluster, such as link popularity, to the representative URL. Consolidating properties from duplicates into one representative URL often provides users with more accurate search results.
To improve this process, we recommend using the parameter handling tool to give Google information about how to handle URLs containing specific parameters. We'll do our best to take this information into account; however, there may be cases when the provided suggestions may do more harm than good for a site.
/store-locator?storeID=123 /product/foo-widget?storeID=123If you configure storeID to not be crawled, both the /store-locator and /foo-widget paths will be affected. As a result, Google may not be able to index both kinds of URLs, nor show them in our search results. If these parameters are used for different purposes, we recommend using different parameter names.
In general, URL parameters fall into one of two categories:
- Parameters that don't change page content: for example,
affiliateid. Parameters like these are often used to track visits and referrers. They have no affect on the actual content of the page. For example, the following URLs all point to the exact same content:
http://www.example.com/products/women/dresses?sessionid=12345 http://www.example.com/products/women/dresses?sessionid=34567 http://www.example.com/products/women/dresses?sessionid=34567&source=google.com
- Parameters that change or determine the content of a page: for example,
sortorder. For example, a parameter can affect content as follows:
- Sorts (for example,
sort=price_ascending): Changes the order in which content is presented.
- Narrows (for example,
t-shirt_size=XS): Filters the content on the page.
- Specifies (for example,
store=women): Determines the set of content displayed on a page.
- Translates (for example,
lang=fr): Displays a translated version of the content.
- Paginates (for example,
page=2): Displays a specific page of a long listing or article.
- Other: Changes content in ways other than those described above.
- Sorts (for example,
We recommend that you use the URL parameters tool to tell Google the purpose of the parameters you use on your site, and how Google should handle URLs that contain those parameters.
Specify how Google should handle parameters:
- On the Dashboard, under Crawl, click URL Parameters.
- Next to the parameter you want, click Edit. (If the parameter isn’t listed, click Add parameter. Note that this tool is case sensitive, so be sure to type your parameter exactly as it appears in your URL.)
- If the parameter doesn't affect the content displayed to the user, select No ... in the Does this parameter change... list, and then click Save. If the parameter does affect the display of content, click Yes: Changes, reorders, or narrows page content, and then select how you want Google to crawl URLs with this parameter.
- Let Googlebot decide. Select if you're unsure of the parameter's behavior, or if the behavior changes for different parts of the site. Googlebot will analyze your site to determine how best to handle the parameter. This is a good general option.
- Every URL. Googlebot will use the value of this parameter to determine if a URL is unique. For example,
www.example.com/dresses/real.htm?productid=1202938will be considered an entirely different URL from
www.example.com/dresses/real.htm?productid=5853729. Before selecting this option, be sure that the parameter really does change the page content; otherwise, Googlebot might unnecessarily crawl duplicate content on your site.
- Only URLs with value=x. Googlebot will crawl only those URLs where the value of this parameter matches this specified value. URLs with a different parameter value won’t be crawled. This is useful if, for example, your site uses the parameter value to change the order in which otherwise identical content is displayed. For example,
www.example.com/dresses/real.htm?sort=price_highcontains the same content as
www.example.com/dresses/real.htm?sort=price_low. Use this setting to tell Googlebot to crawl only those URLs where
sort=price_low(thus avoiding crawling duplicate content).
- No URLs. Googlebot won't crawl any URLs containing this parameter. For example, telling Googlebot not to crawl URLs with parameters such as
http://www.examples.com/search?category=shoe&brand=nike&color=red&size=5&pricefrom=10&priceto=1000) can prevent the unnecessary crawling of content already available from
A single URL may contain many parameters for each of which you can specify settings. More restrictive settings override less restrictive settings. For example, here are three parameters and their settings:
- shopping-category (Every URL)
- sort-by (Only URLs with value = production-year)
- sort-order (Only URLs with value = asc)
Based on these settings, Google would crawl the following URL:
However, Google would not crawl this URL:
www.example.com?shopping-category=shoes&sort-by=size&sort-order=asc. This is because the settings tell Google to crawl only those URLs where the value of the
sort-by parameter equals
production-year. Because shoes are never sorted by production year, this overly restrictive setting results in a lot of content going uncrawled.
rel="canonical"element to the HTML source of your preferred URL. (To use
rel="canonical", you'll need to be able to edit your pages' source code.) More information about canonicalization. Use which option works best for you; it's fine to use both if you want to be very thorough.