Fighting Spam on Joomla K2 Based Websites | Robert Went, PHP Developer

In the days of Joomla 1.5 there were a few limitations to using core content for anything other than simple pages. Yes, plugins were available but they couldn't handle things like nested categories, item specific galleries and videos etc etc. Many components were created to combat those limitations but the outright winner in terms of number of users and community is (by a longshot) k2.

Much of its functionality has now made it to the Joomla core (nested categories, tags, featured images) but it continues to be popular with its users (old and new). Sometimes it can be a bugger to configure, but once it's set up you can have multiple templates with a very flexible output. For an example, take a look at Joomlabamboo's responsive k2 template set 'zenkit'.

Unfortunately there is one thing it lacks, and that is good spam prevention. The choices are recaptcha and stopforumspam. These days spammers get increasingly sophisticated in their methods and can get around most things quite easily. Just google 'captcha bypass' and look at the prices to see what I mean.

Anyway, I have seen some sites grind to a halt because of the number of spam comments submitted and today I was asked to look at one that was getting hit about once every ten minutes which when having to moderate can be pretty annoying.

So the first thing I do with any component is look at the available options and see what is set.

As mentioned, k2 includes the 2 3rd party options of recaptcha and stopforumspam.

Recaptcha is actually now harder for humans to decipher than the api's that will solve thousands of them for you for pretty much nothing. So basically I see it as a way to annoy your customers.

Stopforumspam works on signup with a database of known spam email addresses (which can be generated automatically with spamming software) and sometimes you might want to allow guest comments on your site without users having to sign up (and unable to do so with facebook, twitter etc).

Other than that, you have the ability to only let registered users comment (not always ideal), turn off auto publishing (stops them being immediately visible on the site but you still have to go through everything and allow or delete), and that's about it.

So Why Do People Make Spam Comments?

Generally, it's all in the name of bad SEO. Shifty 'marketers' use automated tools and cheap labour to post blog comments across the web which contain keyword links back to their own websites in the hope that they will rank highly for those words when people search for them.

It's an old tactic which made a lot of people a lot of money. Google are trying to crack down on it by penalising sites with lots of spammy links using the same keywords, but at the same time they encourage social sharing which includes.....You guessed it, Blog Comments! These crappy comments I would assume that when Google search tools can judge the reading level of webpages, then they know a spam comment when they see one, but still, they come.

The human drones that go around placing these links can be tricky as a certain percentage of them can actually read. The bots, however, are sometimes easier to stop.

Stop The Spambots!

Automated spambots targeting webforms are generally looking for one of 2 things. A website text input where they can add a sites url which will be published on that page, or a textarea input where they can insert their 'What a fantastic article, how do I get your rss feed? http://cheapdesignerknockoffsmadeinsweatshops.com' and hope it gets turned into a text link when published.

I think the practice started with the popularity of wordpress and it's inbuilt comment system. There were a lot of links to be had on all those unattended blogs, and it led to akismet which, sadly k2 doesn't use.

A way that used to work well was to create hidden inputs (through css rather than actual hidden inputs as bots didn't use to care about css) with the same names as the wordpress comment inputs 'url' and 'comment'. You could then check to see if they had been filled in and if so, redirect the bot to where ever you liked. These days though it seems they are wise to this and more generally aim for the visible textarea to plough their wares. The practice isn't dead though so can still help.

Give Me The Code Already!

Sorry, off on a ramble there

Note! You can skip the hidden input parts and just use the textarea check if you don't want to get too deep into the code.

So what I am going to do is add one hidden input for url and add a check on the textarea to see if it contains a url.

First of all, we need to add some css so that when the input is added it won't be visible to human visitors. In your templates css file add the following:

/* Fighting the good fight against blog spam */
#comment-form .unwanted {display:none}

As k2 has it's own templating system, if you are using the default template you should override it by copying the /components/com_k2/templates/default to /templates/YOURTEMPLATE/html/com_k2/templates/default this way any changes you make don't get wiped out in the future on an upgrade (which others will - forgive me please Fotis). If you have another template you can also move that to the same templates directory or just edit it in place.

Inside your chosen template(s) you will find a file called item_comments_form.php and this is where we will add our new 'url' input. After the line which starts '<input class="inputbox" type="text" name="commentEmail"' add the following 2 lines to create the label and input.

<label class="url unwanted" for="url">Website</label>
<input class="inputbox unwanted" type="text" name="url" id="url" value="" />

Clear your browser's cache (and Joomla's if you have caching enabled at any level) and check your form to make sure nothing has changed.

So that's the input added, we now need to check the information sent by the form.

As far as I could see, the only way to get the value of our new field was to add a row to the database table, as a new db object is created to store the information whilst validating and if the corresponding row doesn't exist the form data is lost. I'm sure there is a JRequest::getVar('url'); option out there but I was working quickly and couldn't get it to work. So I opened up phpmyadmin and went to the __k2_comments table and added a new row called 'url' with the same type and attributes as 'commentURL'. After that I could use the info sent from the form.

In the file /components/com_k2/models/item.php search for the function 'function comment()' as this is where we need to add some checks. Line numbers will be different depending on your version but the variables should be the same. Look for these lines:

$userName = trim($row->userName);
$commentEmail = trim($row->commentEmail);
$commentText = trim($row->commentText);
$commentURL = trim($row->commentURL);

This is the information coming from the form being assigned to variables. So I added the new field and then some checks to test the values and give it some output.

// Added to check our new url field and content of the comment
 $url = trim($row->url);
 // If the url field is filled in then send a message back asking politely to stop spamming
 if (!empty($url)) {
 $response->message = JText::_('CUSTOM_PLEASE_DONT_SPAM_OUR_SITE');
 echo $json->encode($response);
 $mainframe->close();
 }
 // Check for a link in the code and ask humans to remove the http:// and resubmit the form 
 if (strstr($commentText, 'http://')){
 $response->message = JText::_('CUSTOM_PLEASE_REMOVE_HTTP');
 echo $json->encode($response);
 $mainframe->close();
 }
 // End of added code

As I was working on a multi-lingual site, I used translateable error messages. To change the text for this you should use the override language files to stop the code being wiped. You can find them in /language/overrides/ and then add your translation strings like this:

CUSTOM_PLEASE_DONT_SPAM_OUR_SITE="Please do not spam our website!"
CUSTOM_PLEASE_REMOVE_HTTP="Please remove the 'http://' from any links or your comment will be seen as spam!"

And that is my first attempt with k2, and will see in the next couple of days how succesful it was.

Any bots filling in the url field or adding http:// to the textarea won't be able to read the error messages and continue (From what I saw on this site all links were added with http:// and the links added by humans were just www.website.com).

You can leave out the url field and just use whatever checks on the textarea you like. Maybe create an array of banned words.

Hopefully in the future k2 will incorporate things like akismet, mollum or verified 3rd party logins. Maybe even some kind of filter like this in the comment parameters.

Of course the other option is to turn off comments and then use a k2 plugin for a 3rd party service like disqus or intensedebate and allow them to handle the spam for you. Ok if your site is new but not so easy to migrate existing comments too.

If you have any better ideas that don't involve hacking the component then please let me know.