Archive for the ‘spam’ Category

Comment Spam

Tuesday, June 30th, 2009

I realized that I’m a bit spoiled by my comment spam filtering plugins. On this blog I use two tools that keep the robots out: Akismet and WPSpamFree. And before you say anything about discrimination about robots, let me just say that I don’t care about non-sentient machines. If a true AI awakens somewhere on the internet and feels like posting a comment on my blog, I’m sure it will figure out a way to do it. And if it can’t, it can email me and complain about it. Until then however I’m going to discriminate against the robot race, cause they never post anything interesting.

Every once in a while some of you complain about restrictive spam control here. Sometimes comments get blocked because they have too much links. Not so long ago, quite a few people got blocked just because they were behind a proxy. These are unfortunate glitches and I try to work around them and massage the spam tools to be nicer to people. They are effective, but they lack the much needed people skills… And intelligence. But we are working on that.

In the meantime I wanted to show you this graph, that illustrates the ration of spam comments to non-spam comments on this blog.

Spam vs Ham on Terminally Incoherent

Spam vs Ham on Terminally Incoherent

I didn’t just make up this chart. It came out of my Akismet panel based on the data it collected. Since December 96.73% of comments posted to this blog were pure spam. Can you imagine that? Ninety seven fucking percent! It’s insane!

How many of these spam comments did you see?

None! They were all silently blocked and hidden away so neither you nor I have to deal with them.

Of course, you could say that this graph could be based on only few dozen comments. But it is not – let me post another graph to prove it:

Spam over time - note the recent spikes

Spam over time - note the recent spikes

All in all, I think I’m averaging few hundred to a thousand spam comments each month. Some months seem to be worse than others. For example May and begging of June seem to be particularly bad. I’m not sure if this is just a local fluctuation or an increasing trend. Either way the amount of comments collected over time suggests that the 97% is probably not skewed by a small sample size.

I guess this comes with the territory. Terminally Incoherent seems to be one of the small blogs that are popular enough to get spammed, but not popular enough for the big comment threads to offset the spam-to-ham ratio. Not that I’m complaining.

I’m actually thrilled that I managed to build this small community. I love the fact that I seem to have gained few regular readers who stop by frequently and post insightful comments. I’m also always amazed at the high quality of the discussions we have here. Funny thing is that only time the comment threads seem to degrade is when I get dugg or reddit-ed and we have a temporary influx of new readers.

Anyways, I just wanted to thank you guys for making the delicious ham comments we have here. You make running this blog worth while. And if my spam filtering friends seem annoying sometimes, give them a break. They are doing a great job keeping out all the crap from our comment sections. Without them, spam would drown out any legitimate comments in an endless torrent of unsolicited advertisements.

The Death of CAPTCHA

Tuesday, July 1st, 2008

For a while now we knew that CAPTCHA’s were becoming irrelevant. There were a great solution when they were first introduced, but I think that everyone knew that they are not going to be around for a long time. The tend in technology is always constant improvement – so OCR engines will continuously improve each passing year. CAPTCHA strength on the other hand has an upper bound because it needs to be human readable. You can continue making the pictures more complex and tricky to solve but at some point they become as incomprehensible to a human being, as they are to some random bot. For example, how do you guys like the rapidshare dog/cat CAPTCHA?

The Infamous Cat CAPTCHA

I personally hate that one. Yes, you can sort of figure it out but you actually have to put some effort into it, and sometimes it’s just pure guesswork. Does it help against the automated scripts? I don’t know – I guess this is a question we should direct at Rapidshare. But it sure is annoying to regular users.

The OCR technology is not there yet – it’s getting better, but I presume that we could still get few years out of our CAPTCHA’s if their effectiveness boiled down to complexity of design vs. character recognition arms race. But we all know there is a growing cottage industry out there which uses real people to solve CAPTCHA’s by either tricking them into doing it or paying them per solved puzzle. I always imagined this to be rather shady business conducted in private spammer forums and via private channels. But it is not. They are actually doing this out in the open, as a legitimate paid service:

Image To Text

Here is a screenshot of imagetotext.com – a company which specializes in solving CAPTCHAS. They of course don’t say it like that, but I think the blurbs on their site make it pretty clear that they are not really interested in doing any sort of data entry tasks or into transcribing free hand text into digital format. They are interested in receiving a small image, and shooting back the text at $.02 a pop bought in “packages” of 500 images or more. With a narrow focus like that, what else could they be doing?

Note that I’m not linking to them, because sure as hell they don’t need any Google juice from me. :P The ubiquity of CAPTCHA basically created a new niche industry. All you need now is some clever script that will harvest CAPTCHAS, send them to Image to Text, receive responses and create accounts on popular online services. Thank god these sort of scripts are shady, and probably hard to get, right? You either have to make them yourself, or know where to find them, or who to ask for them. It’s not like anyone can just go to a website and buy, for example, an automated Myspace account creator? Right?

allBots Inc.

This one is from allbots.info – a website that seems to be selling precisely that: account generation scripts that create random profiles, and simply need a human being solving CAPTCHA’s really fast for them. So you buy one of these apps, then purchase a big ass package with ImageToText you can start building your brand new spam empire. All it takes is some cash – you can even be borderline retarded. It won’t slow you down.

Combine the two services, and you have yourself a deadly combo with no programing, and no thinking required. A bit scary if you think about it. I’m not sure how profitable are these two companies, but the fact that they exist indicates that there is demand for these type of services out there.

CAPTCHA’s may be effective in stopping your average home grown spammer, but they are actually creating a whole micro-industry revolving around circumventing them. In other words, they are actually performing natural selection – weeding out the week players with few resources, and leaving only the biggest, baddest and most determined in the game. They are the catalyst, helping to evolve bigger and better bad guys.

Public Turing tests may be doomed and I suspect they might get completely phased out from use on the web in next 5-10 years. And it’s not just CAPTCHA’s – all public Turing tests. After all, it doesn’t matter if you are interpreting an image, solving an equation, or answering a question – it doesn’t really matter if there is a low wage human worker solving it on the other end, and then handing control over to a script.

Google has an interesting idea going on with their text message based application. If you haven’t seen it, try signing up for one of their services such as Gmail or Google App Engine. Instead of using a CAPTCHA they send a text message with an activation code to your cell phone. At least for the time being this system remains much harder to game – which means we might see it being used more and more often by popular online services. Of course it does have serious downsides as not everyone with an internet connection may have a cell phone (think less developed countries) and not all cell carriers may be supported. We will need something else – but what?

It will be interesting to observe where will the anti-bot technology will go in the next few years.

Ambiguous One Liner Comments

Monday, October 15th, 2007

Lately I started getting lots of odd one liner comments that usually look somewhat like this:

Great post! I really enjoyed it!

There are many variations, but the message is always the same – a generic praise that does not contain any references to actual content of the post. Some of these have URL’s attached to them, and some others don’t. Most of them end up in the moderation queue or Akismet spam box.

I do realize that some people might just want to tell me I did a great job, but have nothing else to add. Still, 99.9% of these posts get deleted whether or not they contain a spammy URL.

I do this for several reasons:

  1. Even if the post does not contain a URL, it doesn’t mean it’s not a probe to cheat the first time poster moderation rule (when you post here for the first time, your post is held for moderation – after that, it’s up to Akismet and BadBehavior to keep you out). These things are likely to be followed by regular spam once the email in the email field gets white listed
  2. Even if it’s not spam, the value of such comment is very low. It does not contribute anything to the ongoing discussion. It’s generic – almost like a trackbac, but these at least let me know who links to this post and let me visit the blogs of people who read me. I found lots of cool blogs via trackbacks, or by looking at who links to me on technorati
  3. If you think about it, the only people who would genuinely post a generic message like this would usually not be regular readers. Most probably they would be random visitors who stumbled upon a post via Google. So chances are that if I mistakenly delete their post, taking it for a bot spam, they won’t get upset because they were never planning on coming back to this site anyway.

Of course this is not to say that I don’t appreciate kind words from random visitors. I do appreciate them very much, but at times it’s better to err on the side of caution. So if you posted one of those generic “Good job!” or “Nice Post” comments and it got deleted, I apologize. I don’t delete them automatically – I usually look at each one individually to see if it maybe it is less generic and off topic than I initially suspected. Unfortunately, most are not.

Do you get these types of comments on your blog? Do you think these are bots, malicious people or just some poor lost souls who leave super-genetic comments?