Google uses a computer algorithm to crawl news websites. To help our system determine which of your web pages are articles, make sure that your site follows these technical guidelines.
Requirements for site structure
Google News advises publishers to follow the below site structure guidelines so that it can properly crawl new content.
Permanent section pages
If URLs in your main news sections change frequently, Google News may not be able to understand your site. Non-permanent URLs prevent us from crawling new content because we can’t detect the most current URL to be crawled.
Our automated crawler, Googlebot-News, is most effective when the URLs of your main news sections don't change. Googlebot-News is best able to crawl HTML links. It can’t crawl image links or links embedded in JavaScript. Make sure that your articles on your section pages only have HTML links.
Also, make sure that the anchor text that points to an article in your section pages matches the title of your article and page title. If these technical requirements are an issue for you, a sitemaps-only crawl may be a solution. If you'd like to try crawling your site exclusively by sitemaps, contact our team.
Accessible content
Our crawler needs to access your site to include your content in Google News. Make sure that the directories that host your articles are not blocked by a robots.txt file and meta tags or header specifications do not block access to your article links. Google News crawls with the same robot as Google Web Search, Googlebot.
-
Read manage access to content on your site if you believe that your site's robots.txt file, meta tags or HTML header specifications may be blocking our crawler from accessing your content.
Requirements for languages and encoding
It’s important to understand our guidelines for content languages and encoding your site.
Language
Google News doesn’t show sites with articles that display multiple languages in a single article. Our system has trouble analysing content that contains multiple languages and makes it difficult to ensure that we display the content in the correct language.
If your site has language-specific sections, like, example.com/french and example.com/english, create separate publications for each language. This ensures that users are presented with content in their language. Learn how to set up a publication.
Encode your site
For best results, encode your site in UTF-8. For more information on encoding, visit www.w3.org.
Requirements for individual article pages
To make sure that we only crawl your news articles, Google News has several requirements for individual article pages. Follow the steps below to ensure that you comply with the guidelines.
Article URLs
Your articles' URLs are unique and permanent:
-
Unique URLs: Each page that display an article's full text needs to have a unique URL. We can’t include sites in Google News that display multiple articles under one URL, or that don’t have links to pages dedicated solely to each article.
- Permanent URLs: To make sure that our links to articles work, each article on your news site must be associated with a permanent URL that is unique to each article. For example, we wouldn't be able to crawl the page www.yoursite.com/news1.html if it displays a different story every day.
Important: Do not re-publish articles under a new URL.
If an article is re-published at a later date its URL won't change. For example, if an article is initially published under www.example.com/news1.html, it's not re-published under www.example.com/news2.html. If, in the process of changing domains or content management system (CMS) structure your URL pattern changes, send us your pattern transformation rules. We can help with these pattern changes.
Page layout guidelines
Make sure that your article headlines and publication times are easily identifiable by our automated crawler. Your articles’ pages should use HTML format and the body text isn't embedded in JavaScript.