Comment spam

Comments are a great way for webmasters to build community and readership. Unfortunately, they're often abused by spammers and nogoodniks, many of whom use scripts or other software to generate and post spam. If you've ever received a comment that looked like an advertisement or a random link to an unrelated site, then you've encountered comment spam. Here are some ideas for reducing or preventing comment spam on your website.

Use anti-spam tools

Most website development tools, especially blog tools, can require users to prove they're a real live human, not a nasty spamming engine. You'll have seen these: Generally the user is presented with a distorted image (often called a CAPTCHA, for "completely automated public Turing test to tell computers and humans apart") and asked to type the letters or numbers she sees in the image. Some CAPTCHA systems also support audio CAPTCHAs. This is a pretty effective way of preventing user-generated spam. The process may reduce the number of casual readers who leave comments on your pages or create a user profile, but it will definitely improve the quality of the comments and profiles.

Google's free reCAPTCHA's service is easy to implement on your site. In addition, data collected from the service is used to improve the process of scanning text, such as from books or newspapers. By using reCAPTCHA, you're not only protecting your site from spammers; you're helping to digitize the world's books. If you’d like to implement reCAPTCHA for free on your own site, you can sign up here. Plugins are available for easy installation on popular applications and programming environments such as WordPress and PHP.

Turn on comment moderation

Comment moderation means that no comments will appear on your site until you manually review and approve them. This means you'll spend more time monitoring your comments, but it can really help to improve the user experience for your visitors. It's particularly worthwhile if you regularly post about controversial subjects, where emotions can become heated. It's generally available as a setting in your blogging software, such as Blogger.

Use "nofollow" tags

Together with Yahoo! and MSN, Google introduced the "nofollow" HTML microformat a few years ago, and the attribute has been widely adopted. Any link with the rel="nofollow" attribute will not be used to calculate PageRank or determine the relevancy of your pages for a user query. (For example, if a spammer includes a link in your comments like this:

<a href="http://www.example.com/">This is a nice site!</a>
it will get converted to:
<a href="http://www.example.com/" rel="nofollow">This is a nice site! </a>
This new link will not be taken into account when calculating PageRank. This won't prevent spam, but it will avoid problems with passing PageRank.

 

By default, many blogging sites (such as Blogger) automatically add this attribute to any posted comments.

Disallow hyperlinks in comments

If you have access to the server, you may want to change its configuration to remove HTML tags from comment links inside your guestbook. Spammers will still be able to leave comments, but they won't be able to publish active hyperlinks.

Block comment pages using robots.txt or META tags

You can use your robots.txt file to block Google's access to certain pages. This won't stop spammers from leaving comments or creating user accounts, but it will mean that links in these comments won't negatively impact your site. For example, if comments are stored in the subdirectory guestbook, you could add the following to your robots.txt file:

    Disallow:/guestbook/

This will block Google from indexing the contents of guestbook and any subdirectories.

You can also use the META tag to block access to a single selected page, for example http://www.example.com/article/comments. Like this:

    <html>
    <head>
    <META NAME="googlebot" CONTENT="noindex">

You may wish to use these methods to block profile pages for new and not yet trustworthy users. Once you gain trust in the user, you can remove the crawling or indexing restrictions.

Think twice about enabling a guestbook or comments

A lot of spam doesn't give users a good impression of your site. If this feature isn't adding much value to your users, or if you won't have time to regularly monitor your guestbook or comments, consider turning them off. Most blogging software, such as Blogger, will let you turn comments off for individual posts.

Use a blacklist to prevent repetitive spamming attempts.

Google often sees large numbers of fake profiles on one innocent site all linking to the same domain. Once you find a single spammy profile, make it simple to remove any others.

Add a "report spam" feature to user profiles and friend invitations.

Your users care about your community and are annoyed by spam too. Let them help you solve the problem.

Monitor your site for spammy pages.

One of the best tools for this is Google Alerts. Set up a site: query using commercial or adult keywords that you wouldn't expect to see on your site. Google Alerts is also a great tool to help detect hacked pages. The Keywords page in Webmaster Tools lists significant keywords found on your site, so it's a good idea to check this regularly for unexpected and volatile vocabulary.