Ensuring your site is fully crawlable can help you earn more revenue from your content. In order to make sure you have optimized your site for crawling, consider all of the following issues that might affect how crawlable you are. For more information, learn about the Ad Exchange crawler.
Granting Google’s crawlers access in robots.txt
To ensure we can crawl your sites you want to make sure you’ve given access to Google’s crawlers. This means enabling Google’s crawlers in your robots.txt.
Providing access to any content behind a login
If you have content behind a login, ensure you’ve setup a crawler login. If you have not provided our crawlers a login, then it’s possible that our crawlers are being redirected to a login page, which could result in a “No Content” policy violation, or, that our crawlers receive a 401 (Unauthorized) or 407 (Proxy Authentication Required) error, and thus cannot crawl the content.
Page Not Found
If the URL sent to Google points to a page that does not exist (or no longer exists) on a site, or results in a 404 error ("Not Found"), Google's crawlers will not successfully crawl any content.
If you are overriding the page URL in ad tags, Google’s crawlers may not be able to fetch the content of the page that is requesting an ad, especially if the overwritten page URL is malformed. Generally speaking, the page URL you send to Google in your ad request should match the actual URL of the page you are monetizing, to ensure the right contextual information is being acted on by Google.
If the nameservers for your domain or subdomain are not properly directing our crawlers to your content, or have any restrictions on where requests can come from, then our crawlers may not be able to find your content.
If your site has redirects, there is a risk that our crawler could have issues following through them. For example if there are many redirects, and intermediate redirects fail, or if important parameters such as cookies get dropped during redirection, it could decrease the quality of crawling. Consider minimizing the use of redirects on pages with ad code, and ensuring they are implemented properly.
Sometimes when Google’s crawlers try to access site content, the website’s servers are unable to respond in time. This can happen because the servers are down, slow or get overloaded by requests. We recommend ensuring your site is being hosted on a reliable server or by a reliable service provider.
Geographical, network or IP restrictions
Some sites may put in place restrictions that limit the geographies or IP ranges that can access their content, or having their content behind restricted networks or IP ranges (e.g. 127.0.0.1). If these restrictions prevent Google’s crawlers from reaching all your pages please consider removing these restrictions, or making your content publicly accessible, to enable your URLs to be crawled.
Freshly published content
When you publish a new page, you may make ad requests before Google’s crawlers have gotten a chance to crawl the content. Examples of sites that post lots of new content include sites with user generated content, news articles, large product inventories, or weather sites. Usually after the ad request is made on a new URL, the content will get crawled within a few minutes. However, during these initial few minutes, because your content has not yet been crawled, you may experience low ad volume.
(using URL parameters or dynamically generated URL paths)
Some websites include extra parameters in their URLs that indicate the user who is logged in (e.g. a SessionID), or other information that may be unique to each visit. When this happens, Google’s crawlers may treat the URL as a new page, even if the content is the same. This could result in a few minute lag time between the first ad request on the page and when the page gets crawled, as well as an increase in the crawler load on your servers. Generally, if the content on a page does not change, consider removing the parameters from the URL and persisting that information another way. Having a simpler URL structure helps make your site easily crawlable.
Using POST data
If your site sends POST data along with urls (for example passing form data via a POST request), it's possible that your site is rejecting requests that are not accompanied by POST data. Note that since Google’s crawlers will not provide any POST data, such a setup would prevent the crawlers from accessing your page. If the page content is determined by the data the user inputs to the form, consider using a GET request.