Block access to your files

If you have pages or other content that you don't want to appear in Google's search results, you have a number of options.

  • If you need to keep confidential content on your server, save it in a password-protected directory. Googlebot and other spiders won't be able to access the content. This is the simplest and most effective way to prevent Googlebot and other spiders from crawling and indexing content on your site. If you're using Apache Web Server, you can edit your .htaccess file to password-protect the directory on your server. There are a lot of tools on the web that will let you do this easily.

  • Use a robots.txt to control access to files and directories on your server. The robots.txt file is like an electronic No Trespassing sign. It tells Googlebot and other crawlers which files and directories on your server should not be crawled.

    In order to use a robots.txt file, you'll need to have access to the root of your host (if you're not sure, check with your web hoster). If you don't have access to the root of your domain, you can restrict access using the robots meta tag on individual pages.

    It's important to note that even if you use a robots.txt file to block spiders from crawling content on your site, Google could discover it in other ways and add it to our index. For example, other sites may still link to it. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project, can appear in Google search results. In addition, while all respectable robots will respect the directives in a robots.txt file, some may interpret them differently. However, a robots.txt is not enforceable, and some spammers and other troublemakers may ignore it. For this reason, we recommend password-protecting confidential information (see above).

    You can test your robots.txt file on the Blocked URLs (robots.txt) tab of the Crawler access page.

    About using robots.txt to control access to your site

  • Use a noindex meta tag to prevent content from appearing in our search results. When we see a noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it. If the content is currently in our index, we will remove it after the next time we crawl and reprocess it. (To expedite removal, use the Remove URLs tool in Google Webmaster Tools.) Other search engines, however, may interpret this directive differently. As a result, a link to the page can still appear in their search results.

    Because we have to crawl your page in order to see the noindex tag, there's a small chance that Googlebot won't see and respect the noindex meta tag (for example, if we haven't crawled the page since you added the tag).

    About using meta tags to control access to your site