Resolving duplicate articles

The articles that appear in Google News are determined entirely by computer algorithm. If Google News encounters numerous versions of the same article, our algorithms may have trouble identifying which article is the original one and which is the duplicate version.

News sites can help Google News find the original version of a news article via two methods: 1) the rel="canonical" meta tag, and 2) disallowing the user agent for Google News and Google Web Search.

rel=canonical

If you publish the same article on multiple pages within your site, or within your network of sites, you may want to use the rel="canonical" tag.

You can read more about canonicalization in our Webmaster Help Center .

Disallow Googlebot-News

If you syndicate your articles to other news sites, you may want to ensure that only the original version of your articles appear in Google News. To do this, your syndication partners should use a robots meta tag to disallow Google News from indexing their versions of your original article.

For example, if the editor of The Example Times wants to ensure that the article she is using with permission from The Example Gazette doesn't get included in Google News, she would implement the following code in that article page's HTML:

<meta name="Googlebot-News" content="noindex">

The use of the above meta tag on a syndicated article would result in it not appearing on the Google News homepage, topic pages, or story pages. For more information about how to specify which crawler can access certain content please read our article on robots.

Disallow Googlebot

For syndicated content that you would prefer to restrict from Google News and Google Web Search, you'll have to specify that Google's main user-agent, Googlebot, does not index your content. If that same editor of The Example Times prefers not have the syndicated version of the story from The Example Gazette appear in search results, she would implement the same code as above but use "Googlebot" as the value in the name attribute:

<meta name="Googlebot" content="noindex">

For more information about how to specify which crawler can access certain content please read our article on robots.