How do I exclude certain files?
In the Admin Console, you can exclude any particular file format from being crawled and indexed by defining URL pattern exceptions to prevent crawling from occurring on those pages. URLs matching the patterns specified in this window will not be crawled. To match a specific file, specify its name in the pattern. The Google Mini honors robots.txt and the robots meta tag directives. You may want to crawl all the content you want to index and then add a file type filter in the admin console in order to show only results related to a certain file type. A document is any piece of digital content that the Google Search Appliance can index, including Microsoft Office documents, PDF files, AutoCAD drawings, a row in a database, unique URLs or any of more than 220 supported file types. You may want to use the "Do Not Crawl URLs with the Following Patterns" feature to exclude or include files from the crawl. To make a pattern or file type unavailable to the crawler, remove the # mark in the line containing the file type. For example, to make Excel files on your servers unavailable to the crawler, change the line #.xls$ to .xls$