Ensuring your site is fully crawlable can help you earn more revenue from your content. In order to make sure you have optimized your site for crawling, consider all of the following issues that might affect how crawlable you are.
Grant Google’s crawlers access in robots.txt
To ensure we can crawl your sites, make sure you’ve given access to Google’s crawlers.
If you’ve modified your site’s
robots.txt file to disallow the Ad Manager crawler from indexing your pages, then we are not able serve Google ads on these pages. Update your
robots.txt file to grant our crawler access to your pages.
Remove the following two lines of text from your
This change allows our crawler to index the content of your site and provide you with Google ads.
Any changes you make to your robots.txt file may not be reflected in our index until our crawlers attempt to visit your site again.
Providing access to any content behind a login
If you have content behind a login, ensure you’ve setup a crawler login.
If you have not provided our crawlers a login, then it’s possible that our crawlers are being redirected to a login page, which could result in a “No Content” policy violation. It's also possible that our crawlers receive a 401 (Unauthorized) or 407 (Proxy Authentication Required) error, and thus cannot crawl the content.
Page Not Found
If the URL sent to Google points to a page that does not exist (or no longer exists) on a site, or results in a 404 error ("Not Found"), Google's crawlers will not successfully crawl any content.
If you are overriding the page URL in ad tags, Google’s crawlers may not be able to fetch the content of the page that is requesting an ad, especially if the overwritten page URL is malformed.
Generally speaking, the page URL you send to Google in your ad request should match the actual URL of the page you are monetizing, to ensure the right contextual information is being acted on by Google.
If the nameservers for your domain or subdomain are not properly directing our crawlers to your content, or have any restrictions on where requests can come from, then our crawlers may not be able to find your content.
Broken or duplicative redirects
If your site has redirects, there is a risk that our crawler could have issues following through them. For example, if there are many redirects, and intermediate redirects fail, or if important parameters such as cookies get dropped during redirection, it could decrease the quality of crawling.
Consider minimizing the use of redirects on pages with ad code, and ensuring they are implemented properly.
Sometimes when Google’s crawlers try to access site content, the website’s servers are unable to respond in time. This can happen because the servers are down, slow or get overloaded by requests.
We recommend that you ensure your site is hosted on a reliable server or by a reliable service provider.
Geographical, network or IP restrictions
Some sites may put in place restrictions that limit the geographies or IP ranges that can access their content, or having their content behind restricted networks or IP ranges (for example, 127.0.0.1).
If these restrictions prevent Google’s crawlers from reaching all your pages please consider removing these restrictions, or making your content publicly accessible, to allow your URLs to be crawled.
Freshly published content
When you publish a new page, you may make ad requests before Google’s crawlers have gotten a chance to crawl the content. For example, sites that post lots of new content include news sites, sites with user generated content, sites with large product inventories, weather sites, and more.
Usually after the ad request is made on a new URL, the content will get crawled within a few minutes. However, during these initial few minutes, because your content has not yet been crawled, you may experience low ad volume.
Personalized pages (using URL parameters or dynamically generated URL paths)
Some websites include extra parameters in their URLs that indicate the user who is logged in (for example, a SessionID), or other information that may be unique to each visit. When this happens, Google’s crawlers may treat the URL as a new page, even if the content is the same. This could result in a few minute lag time between the first ad request on the page and when the page gets crawled, as well as an increase in the crawler load on your servers.
Generally, if the content on a page does not change, consider removing the parameters from the URL and persisting that information another way. A simpler URL structure helps make your site easily crawlable.
Using POST data
If your site sends POST data along with urls (for example, passing form data via a POST request), it's possible that your site is rejecting requests that are not accompanied by POST data. Note that since Google’s crawlers will not provide any POST data, such a setup would prevent the crawlers from accessing your page.
If the page content is determined by the data the user inputs to the form, consider using a GET request.