Search
Clear search
Close search
Google apps
Main menu
true

Content policies

Content filtering

Content filtering refers to an automatic system put in place to process large volumes of data and take action on any content that meets certain criteria. Publishers often use text and media-filtering solutions to handle the bulk of the user-generated content on their site. These systems are often put in place to filter content such as adult and illegal filesharing as well as the sale of firearms, drugs, alcohol and tobacco.

Important: The violating content does not have to be hosted locally. Even linking to external sources that host it is considered a violation. For example, a publisher framing movies hosted illegally on a third-party site is violating the AdSense program policies.

Developing an in-house solution

Many publishers choose to develop their own filtering system. This decision can have the following benefits:

  • Text-based filtering can be relatively easy to code
  • It is often significantly cheaper than commercial solutions
  • The publisher knows their site and users best and can anticipate policy issues better than anyone else
Following are a few ideas and suggestions to consider when developing an in-house text-based solution.

Creating a list of keywords
To filter text, the system needs to rely on a list of keywords made up of individual words as well as word combinations. Creating this list can be done in a number of ways, depending on the type of content, it’s volume on the site and the publisher's available resources:
  • Compile your own list of words and phrases that you wish to filter. You can use your own intuition or get some help:
    • Ask your employees to contribute
    • Reach out to your users for help
    • Use Google Adwords: Keywords tool
    • For additional inspiration take a look at websites that host undesirable content (adult and/or filesharing sites for example), and find out which keywords show up frequently on these.
  • Code your own automatic keyword scraping tool:
    • Use search engine data to go through all pages on a site
    • Retrieve a list of unique words and word combinations on it
    • Keep the most commonly used keywords and discard the rest. Don’t forget to eliminate common articles and words like ‘a’, ‘and’ or ‘the’.
    • Output as a text file
    • Repeat the above for any number of sites until you are satisfied with your list, and you’re done.
    • Important: Scraping other sites and using their content as your own is against Adsense Policies and the Google Webmaster Guidelines and might also be illegal and/or unethical.
Assigning weights

All words are not created equal, and some keywords are worse than others. You should therefore consider assigning different weights to different terms.

For example, adult filters in English should weigh the word ‘porno’ higher than ‘sex’. While ‘porno’ is almost exclusively related to content that is not family-safe, ‘sex’ may also mean ‘gender’ - depending on the context it is used in.

Also consider words that are safe on their own but put together with another word might indicate something else entirely. The word ‘pictures’ for example is innocent enough, but ‘teen pictures’ would often refer to pornography.

The filtering process
There are two common approaches when dealing with content filtering, and it is up to each publisher to decide what makes the most sense for their site.

Method 1 - User generated content is scanned after it is displayed on a page:

  1. Scan the content
  2. Flag if it meets filtering criteria
  3. Disable ad serving on the page hosting said content
  4. Manually review content:
    1. If it is safe, enable ad serving and adjust filters
    2. If it is not, make sure the content is not displayed on pages that include ad code

Method 2 - User generated content is scanned before it is made available to users:

  1. Scan the content
  2. Flag if it meets filtering criteria
  3. Queue it for review or reject it outright
  4. Manually review content:
    1. If it is safe, show it on ad serving pages and adjust filters
    2. If it is not, disable ad-serving and show it or reject it

Commercial solutions in a nutshell

There are a number of services that provide content filtering, even a few that specialize in filtering specific types like adult or copyrighted content. There are also crowdsourcing platforms that create a bridge between publishers and users looking to make easy money on the Internet. The best way to approach this is to do some market research on the topic and decide on the best solution for the service you are providing. Try looking for sites that review software and see what kinds of user-generated content filtering systems they are recommending. After having all of this information at hand you should decide on the best solution for you based on the product’s score, its unique features as well as its pricing model.

Was this article helpful?
How can we improve it?
Sign in to AdSense

Sign in to AdSense to see help for your account Don't have an AdSense account? Sign Up for AdSense!