Using robots to block Google News
We understand that news organisations publish lots of content and not all of it may be right for Google News. You can prevent parts of your site from being indexed by our web crawlers if you create a robots.txt file, meta tags or HTTP header specifications. Google News crawls with the same robot as Google Web Search, called Googlebot .
If you would prefer not to be included in Google News, but want to remain in Web search, Google News respects a robots entry for Googlebot-News, if it is more restrictive than the robots entry for Googlebot. In other words:
- If you block access to Googlebot-News, we will not index your site in Google News.
- If you block access to Googlebot, we will not index your site in Google News or Web Search.
Creating a robots.txt file
Using a robots.txt file gives you a high level of control over what parts of your site are indexed by Google. You'll find a comprehensive guide to creating and maintaining robots.txt files at our Webmaster Help Centre .
- To prevent your site from being indexed by Google News, block access to Googlebot-News using a robots.txt file.
- To prevent your site from being indexed by Google News and Web Search, block access to Googlebot using a robots.txt file.
Be careful to provide our crawler with access to your robots.txt file so that we will know if you've specified that certain sections of your site should not be crawled.
Creating a meta tag
Rather than use a robots.txt file to block crawler access to areas of your site, you can add a meta tag to an HTML page to tell robots not to index specific pages. This standard is described in our Webmaster Help Centre .
To prevent specific articles on your site from being indexed by Google News, block access to Googlebot-News using a meta tag.
To prevent specific articles on your site from being indexed by Google News and Web Search, block access to Googlebot using a meta tag.
To prevent specific articles on your site from being indexed by all robots, block access using the following meta tag:
<meta name="robots" content="noindex, nofollow">
To prevent robots from indexing images on a specific article, block access using the following meta tag:
<meta name="robots" content="noimageindex">
To inform us that an article will expire at a certain time, at which point it should be removed from the Google index, you would use the following tag:
<meta name="googlebot" content="unavailable_after: 25-Aug-2011 15:00:00 EST">
The date and time must be specified in the RFC 850 format . This information is treated as a removal request: It will take about a day after the removal date passes for the page to disappear from the search results. However, in order for the tag to function properly, it must be included with your article at the time that it is first crawled.
Using HTTP Header Specifications
You can also provide robots instructions in the HTTP header. Please visit the Google Developers article on HTTP header specifications for more information.