Block URLs with robots.txt

Learn about robots.txt files

A robots.txt file is a file at the root of your site that indicates those parts of your site you don’t want accessed by search engine crawlers. The file uses the Robots Exclusion Standard, which is a protocol with a small set of commands that can be used to indicate access to your site by section and by specific kinds of web crawlers (such as mobile crawlers vs desktop crawlers).

What is robots.txt used for?

Non-image files

For non-image files (that is, web pages) robots.txt should only be used to control crawling traffic, typically because you don't want your server to be overwhelmed by Google's crawler or to waste crawl budget crawling unimportant or similar pages on your site. You should not use robots.txt as a means to hide your web pages from Google Search results. This is because other pages might point to your page, and your page could get indexed that way, avoiding the robots.txt file. If you want to block your page from search results, use another method such as password protection or noindex tags or directives.

Image files

robots.txt does prevent image files from appearing in Google search results. (However it does not prevent other pages or users from linking to your image.)

Resource files

You can use robots.txt to block resource files such as unimportant image, script, or style files, if you think that pages loaded without these resources will not be significantly affected by the loss. However, if the absence of these resources make the page harder to understand for Google's crawler, you should not block them, or else Google won't do a good job of analyzing your pages that depend on those resources.

Understand the limitations of robots.txt

Before you build your robots.txt, you should know the risks of this URL blocking method. At times, you might want to consider other mechanisms to ensure your URLs are not findable on the web.

  • Robots.txt instructions are directives only

    The instructions in robots.txt files cannot enforce crawler behavior to your site; instead, these instructions act as directives to the crawlers accessing your site. While Googlebot and other respectable web crawlers obey the instructions in a robots.txt file, other crawlers might not. Therefore, if you want to keep information secure from web crawlers, it’s better to use other blocking methods, such as password-protecting private files on your server.
  • Different crawlers interpret syntax differently

    Although respectable web crawlers follow the directives in a robots.txt file, each crawler might interpret the directives differently. You should know the proper syntax for addressing different web crawlers as some might not understand certain instructions.
  • Your robots.txt directives can’t prevent references to your URLs from other sites

    While Google won't crawl or index the content blocked by robots.txt, we might still find and index a disallowed URL from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the site can still appear in Google search results. You can stop your URL from appearing in Google Search results completely by using other URL blocking methods, such as password-protecting the files on your server or using the noindex meta tag or response header.
Note: Combining multiple crawling and indexing directives might cause some directives to counteract other directives. Learn how to configure these directives properly by reading the Combining crawling with indexing / serving directives section of the Google Developers documentation.
Was this article helpful?