A crawler, also known as a spider or a bot, is the software Google uses to process and index the content of webpages. The Ad Exchange crawler visits your site to determine its content in order to provide relevant ads.
Here are some important facts to know about the Ad Exchange crawler:
- The Ad Exchange crawler is different from the Google crawler. The two crawlers are separate, but they do share a cache. We do this to avoid having both crawlers request the same pages, thereby helping publishers conserve their bandwidth. Similarly, the Webmaster Tools crawler is separate.
- Resolved Ad Exchange crawl issues don't resolve issues with the Google crawl. There is no impact on your placement within Google search results when you resolve the issues with the Ad Exchange crawler. For more information on your site's ranking on Google, review our entry on how to get included in Google search results.
- The crawler indexes by URL. Our crawler gains separate access to site.com and www.site.com. However, our crawler doesn't count site.com and site.com/#anchor as individual URLs.
- The crawler doesn't access pages or directories prohibited by a robots.txt file. Both the Google and Ad Exchange Mediapartners crawlers honor your robots.txt file. Therefore, if your robot.txt file prohibits access to certain pages or directories, then they are not crawled.
If you’re serving ads on pages that are being roboted out with the line
User-agent: *, then the Ad Exchange crawler will still crawl these pages. To prevent the Ad Exchange crawler from access to your pages, you need to specify
User-agent: Mediapartners-Googlein your robots.txt file. Learn more about how to provide crawler access in your robots.txt file.
- The crawler will attempt to access URLs only where our ad tags are implemented. Only pages that display Google ads should send requests to our systems and be crawled.
- The crawler will attempt to access pages that redirect. When you have 'original pages' that redirect to other pages, our crawler must access the original pages to determine that a redirect is in place. Therefore, our crawler's visit to the original pages appear in your access logs.
- Re-crawling sites. At this time, we're unable to control how often our crawlers index the content on your site. Crawling is automatically done by our bots. If you make changes to a page, it can take up to 1 to 2 weeks before the changes are reflected in our index.