News-specific crawl errors

In order to view error reports specific to Google News, news publishers need to include their site in Google News, have created a Webmaster Tools account and added their site to it. Please contact us to request inclusion in Google News. Once this is done, follow the steps below:

  • On the Home page, click the site's URL.
  • On the Dashboard, click Crawl > Crawl Errors.
  • Click on the News tab to see crawl errors for your news content.
  • Crawl errors are organized into categories, such as "Article extraction " or "Title error." Clicking on one of these categories will display a list of affected URLs and the crawl errors they're generating.
  • News-specific errors include:

    Article disproportionately short

    Explanation

    The article body that we extracted from the HTML page is too small when compared to other clusters of text without links on the page. This applies to most pages that contain news briefs or multimedia content, rather than full news articles. We generated this error to avoid including what might be an incorrect piece of text.

    Recommendations

    This problem is often caused by:

    • Too many snippets for related articles - to help our extractor please consider making these snippets clickable.
    • Features such as 'Send this article to friends' with long descriptions - consider setting a "display:none" or "visibility:hidden" style to make the text invisible or writing the pieces of HTML code by JavasScript dynamically.
    • User comments - consider enclosing the comments in an iframe, dynamically fetching them with AJAX or moving them to an adjacent page.

    If none of these resolve the error, please let us know.

    Article fragmented

    Explanation

    The article body that we extracted from the HTML page appears to consist of isolated sentences not grouped together into paragraphs. We generated this error to avoid including what might be an incorrect piece of text.

    Recommendations

    • Check that your paragraphs are formatted such that each is more than one sentence in length.
    • Make sure your sentences are well punctuated.
    • Make sure you don't use frequent <br> and <p> tags within your paragraphs, and try to avoid breaking up the article body in general.
    • Consider removing some of the non-article text from the article page.

    If none of these resolve the error, please let us know.

    Article too long

    Explanation

    The article body that we extracted from the HTML page appears to be too long to be a news article. We generated this error to avoid including what might be an incorrect piece of text. Common causes include news articles that contain user-contributed comments below the article, or HTML layouts that contain other material besides the news article itself.

    Recommendations

    Consider removing some of the non-article text from the article page. If the article page contains user comments, consider one of the following options:

    • enclosing them in an iframe.
    • dynamically fetching them with AJAX.
    • moving part of the comments to an adjacent page.

    If none of these resolve the error, please let us know.

    Article too short

    Explanation

    The article body that we extracted from the HTML page appears to contain too few words to be a news article. This applies to most pages that contain news briefs or multimedia content, rather than full news articles. We generated this error to avoid including what might be an incorrect piece of text.

    Recommendations

    • Try formatting your articles into text paragraphs of a few sentences each. If the article content appears to contain too few words to be a news article, we won't be able to include it.
    • Make sure your articles have more than 80 words.

    If none of these resolve the error, please let us know.

    Date not found

    Explanation

    We were unable to determine the publication date of the article.

    Recommendations

    Follow the date formatting recommendations below:

    • Place a clear date and time for each of your articles in between the article's title and the article's text in a separate line of HTML. The date should specify when the article was first published.
    • Remove any other dates from the HTML of the article page so that the crawler doesn't mistake them for the correct publication time.
    • If you'd like to use a date metatag, please contact us first. Date meta tags should be of the form: <meta name="DC.date.issued" content="YYYY-MM-DD">, where the date is in W3C format, using either the "complete date" (YYYY-MM-DD) format, or the "complete date plus hours, minutes and seconds" (YYYY-MM-DDThh:mm:ssTZD) format with a time zone suffix.
    • Create a News Sitemap. The <publication_date> tag will ensure we're able to pick the correct date for your articles.

    Date too old

    Explanation

    The date that we determined for this article, either from a <publication_date> tag in the Sitemap, or from a date in the page HTML itself, is too old.

    Recommendations

    • Make sure your article is less than 2 days old. Currently we are only collecting articles that are 2 days old or less.
    • Follow the date formatting recommendations above.

    Empty article

    Explanation

    The article body that we extracted from the HTML page appears to be empty.

    Recommendations

    • Make sure that the full text of each of your articles is available in the source code of your article pages (and not embedded in a JavaScript file or iframe, for example).
    • Make sure that you're not using a style in the source code of your articles such as "display:none" or "visibility:hidden".
    • Make sure the links to your articles lead directly to your articles pages rather than to an intermediate page using a Javascript redirect.

    Extraction failed

    Explanation

    We were unable to extract the article from the page. Extractions fail when we are unable to identify a valid title, body, and timestamp for the article. We list URLs with this error to provide you with information regarding why some articles may not appear in Google News.

    Recommendations

    • Make sure that your title, body, and timestamp are easily crawlable (are available as text and not as images, for instance), but at this time, this error is primarily for informational purposes. We are actively working to improve our extraction methods so that you'll see this error less often.
    • Submit a News Sitemap.

    Invalid date meta tag

    Explanation

    The HTML page contains a date <meta> tag that we were unable to parse.

    Recommendations

    Date <meta> tags should be of the form: <meta name="DC.date.issued" content="YYYY-MM-DD">, where the date is in W3C format (http://www.w3.org/TR/NOTE-datetime), using either the "complete date" (YYYY-MM-DD) or "complete date plus hours, minutes and seconds" (YYYY-MM-DDThh:mm:ss) format, with optional fraction and time zone suffixes. The date should specify when the article was first published.

    No links found

    Explanation

    Googlebot-News didn't find any links to valid news articles on the page. This error applies only to news section pages.

    Recommendations

    • Make sure your article URLs contain at least a 3-digit number as specified in the following guidelines. Otherwise, consider submitting your articles through a News Sitemap.
    • Make sure your articles are located within the domain of the site included in Google News.
    • Check the page that generated the error and make sure it includes crawlable links to news articles. Googlebot-News is best able to crawl HTML links and is unable to crawl image links or linked embedded in JavaScript. See our Webmaster Guidelines and tips for creating a Google-friendly site for information on how to ensure your links are crawlable.

    No sentences found

    Explanation

    The article body that we extracted from the HTML page appears not to contain punctuated sequences of contiguous words. We generated this error to avoid including what might be an incorrect section of text.

    Recommendations

    • If the article content doesn't have punctuated sequences of contiguous words, we won't be able to include it in Google News. Make sure that the text of your articles is made up of sentences, and that you don't use frequent <br> or <p> tags within your paragraphs.
    • Make sure that the full text of each of your articles is available in the source code of your article pages (and not embedded in a JavaScript file, for example).
    • Make sure the links to your articles lead directly to your articles pages rather than to an intermediate page using a JavaScript redirect.

    Noindex tag found

    Explanation

    The HTML page of the article contains a <meta> "noindex" tag, prohibiting Google from indexing the page.

    Recommendations

    Remove the "noindex" <meta> tag from your article pages.

    Off-site redirect

    Explanation

    The section or article page redirects to a URL on a different domain.

    Recommendations

    • All section pages and articles must be located within the domain of the site included in Google News.
    • If you are not using off-site redirects, please make sure your site has not been modified by a third party. Read more about hacked sites.

    Page too large

    Explanation

    The section or article page length exceeds the maximum allowed.

    Recommendations

    The HTML source page can be up to 256KB in size.

    Title not allowed

    Explanation

    The title that we extracted from the HTML page suggests that it is not a news article.

    Recommendations

    Often this problem can be fixed by setting the <title> tag on the HTML page to the title of the article, and repeating the title in a prominent place on the HTML page, such as in an <h1> tag. Read more about titles.

    Title not found

    Explanation

    We were unable to extract a title for the article from the HTML page.

    Recommendations

    • Follow our title formatting recommendations.
    • To make sure your articles display properly on mobile devices, don't include a leading number (which sometimes corresponds to an access key) in the anchor text of the title.

    Uncompression failed

    Explanation

    Googlebot-News detected that the page was compressed, but was unable to uncompress it. This can be caused by bad network condition or bad web server programming or configuration.

    Recommendations

    Please check your network/webserver.

    Unsupported content type

    Explanation

    The page had an HTTP content-type that is not supported by Google News.

    Recommendations

    Articles must have a content-type of text/html, text/plain or application/xhtml+xml.

    Google News Sitemaps are best structured as a small, fixed set. When you publish new articles, please update your existing Sitemaps, rather than creating a new Sitemap for them. Frequent creation of new Sitemaps, e.g., one for each calendar day, is not recommended.