Archive for the 'spam' Category

The Death of CAPTCHA

Tuesday, July 1st, 2008

For a while now we knew that CAPTCHA’s were becoming irrelevant. There were a great solution when they were first introduced, but I think that everyone knew that they are not going to be around for a long time. The tend in technology is always constant improvement - so OCR engines will continuously improve each passing year. CAPTCHA strength on the other hand has an upper bound because it needs to be human readable. You can continue making the pictures more complex and tricky to solve but at some point they become as incomprehensible to a human being, as they are to some random bot. For example, how do you guys like the rapidshare dog/cat CAPTCHA?

The Infamous Cat CAPTCHA

I personally hate that one. Yes, you can sort of figure it out but you actually have to put some effort into it, and sometimes it’s just pure guesswork. Does it help against the automated scripts? I don’t know - I guess this is a question we should direct at Rapidshare. But it sure is annoying to regular users.

The OCR technology is not there yet - it’s getting better, but I presume that we could still get few years out of our CAPTCHA’s if their effectiveness boiled down to complexity of design vs. character recognition arms race. But we all know there is a growing cottage industry out there which uses real people to solve CAPTCHA’s by either tricking them into doing it or paying them per solved puzzle. I always imagined this to be rather shady business conducted in private spammer forums and via private channels. But it is not. They are actually doing this out in the open, as a legitimate paid service:

Image To Text

Here is a screenshot of imagetotext.com - a company which specializes in solving CAPTCHAS. They of course don’t say it like that, but I think the blurbs on their site make it pretty clear that they are not really interested in doing any sort of data entry tasks or into transcribing free hand text into digital format. They are interested in receiving a small image, and shooting back the text at $.02 a pop bought in “packages” of 500 images or more. With a narrow focus like that, what else could they be doing?

Note that I’m not linking to them, because sure as hell they don’t need any Google juice from me. P The ubiquity of CAPTCHA basically created a new niche industry. All you need now is some clever script that will harvest CAPTCHAS, send them to Image to Text, receive responses and create accounts on popular online services. Thank god these sort of scripts are shady, and probably hard to get, right? You either have to make them yourself, or know where to find them, or who to ask for them. It’s not like anyone can just go to a website and buy, for example, an automated Myspace account creator? Right?

allBots Inc.

This one is from allbots.info - a website that seems to be selling precisely that: account generation scripts that create random profiles, and simply need a human being solving CAPTCHA’s really fast for them. So you buy one of these apps, then purchase a big ass package with ImageToText you can start building your brand new spam empire. All it takes is some cash - you can even be borderline retarded. It won’t slow you down.

Combine the two services, and you have yourself a deadly combo with no programing, and no thinking required. A bit scary if you think about it. I’m not sure how profitable are these two companies, but the fact that they exist indicates that there is demand for these type of services out there.

CAPTCHA’s may be effective in stopping your average home grown spammer, but they are actually creating a whole micro-industry revolving around circumventing them. In other words, they are actually performing natural selection - weeding out the week players with few resources, and leaving only the biggest, baddest and most determined in the game. They are the catalyst, helping to evolve bigger and better bad guys.

Public Turing tests may be doomed and I suspect they might get completely phased out from use on the web in next 5-10 years. And it’s not just CAPTCHA’s - all public Turing tests. After all, it doesn’t matter if you are interpreting an image, solving an equation, or answering a question - it doesn’t really matter if there is a low wage human worker solving it on the other end, and then handing control over to a script.

Google has an interesting idea going on with their text message based application. If you haven’t seen it, try signing up for one of their services such as Gmail or Google App Engine. Instead of using a CAPTCHA they send a text message with an activation code to your cell phone. At least for the time being this system remains much harder to game - which means we might see it being used more and more often by popular online services. Of course it does have serious downsides as not everyone with an internet connection may have a cell phone (think less developed countries) and not all cell carriers may be supported. We will need something else - but what?

It will be interesting to observe where will the anti-bot technology will go in the next few years.

Ambiguous One Liner Comments

Monday, October 15th, 2007

Lately I started getting lots of odd one liner comments that usually look somewhat like this:

Great post! I really enjoyed it!

There are many variations, but the message is always the same - a generic praise that does not contain any references to actual content of the post. Some of these have URL’s attached to them, and some others don’t. Most of them end up in the moderation queue or Akismet spam box.

I do realize that some people might just want to tell me I did a great job, but have nothing else to add. Still, 99.9% of these posts get deleted whether or not they contain a spammy URL.

I do this for several reasons:

  1. Even if the post does not contain a URL, it doesn’t mean it’s not a probe to cheat the first time poster moderation rule (when you post here for the first time, your post is held for moderation - after that, it’s up to Akismet and BadBehavior to keep you out). These things are likely to be followed by regular spam once the email in the email field gets white listed
  2. Even if it’s not spam, the value of such comment is very low. It does not contribute anything to the ongoing discussion. It’s generic - almost like a trackbac, but these at least let me know who links to this post and let me visit the blogs of people who read me. I found lots of cool blogs via trackbacks, or by looking at who links to me on technorati
  3. If you think about it, the only people who would genuinely post a generic message like this would usually not be regular readers. Most probably they would be random visitors who stumbled upon a post via Google. So chances are that if I mistakenly delete their post, taking it for a bot spam, they won’t get upset because they were never planning on coming back to this site anyway.

Of course this is not to say that I don’t appreciate kind words from random visitors. I do appreciate them very much, but at times it’s better to err on the side of caution. So if you posted one of those generic “Good job!” or “Nice Post” comments and it got deleted, I apologize. I don’t delete them automatically - I usually look at each one individually to see if it maybe it is less generic and off topic than I initially suspected. Unfortunately, most are not.

Do you get these types of comments on your blog? Do you think these are bots, malicious people or just some poor lost souls who leave super-genetic comments?

The “Part Time Job Offer” Scam

Monday, September 24th, 2007

Since documenting various email scams such as the UK National Lottery Scam, and the Lady Rita Mosley Scam turned out to be really helpful to so many people, I decided to tackle another spam that appeared in my mailbox recently. It’s the “Part Time Job Offer” thing that you may or may not heard of. Here is the lovely email I found:


Dear Sir/Ma,

I am Raphael Smith; we are Aloevera Company looking for a representative to represent the company. We have sales representatives all over the world to distribute our products. You know, that it’s not easy to start a business in a new market (being the USA). WE ARE BASED IN UK, BUT WE HAVE BEEN RECEIVING ORDERS FROM NORTHERN AMERICA. There are hundreds of ompetitors, close direct contacts between suppliers and customers and other difficulties, which impede our sales promotion.

We have decided to deliver the products upfront, it’s very risky but it should push up sales on 25 percent. Thus we need to get payments for our products as soon as possible. Unfortunately we are unable to open Bank Accounts in the United US without first registering the company name.

Let me interject here for a second. How hard is it to register a company in US? Can’t be that hard, can it? I mean, think about it: if companies like allofmp3.com were able to sell to US customers despite shaky legality of their operation, then why would it be so hard for a hand cream company to do the same? Oh, yeah - cause it’s a scam. I forgot. Carry on.


Presently with the amount of Orders we have, we cannot put them on hold. For fear of loosing the customers out rightly. Secondly we cannot cash these payments from the US soon enough, as international Checks take about 14 working days for cash to be made available. We lose about 100,000 USD of net income each month because we have money transfer delays. Your task as a
representative of the company is to coordinate payments from customers and help us with the payment process. You are not involved in any sales.

Wait… Out rightly? I think you are looking for “outright” but that would still not make much sense.


Once orders are received and sorted we deliver the product to a customer (usually through UPS).The customer receives and inspects the products. After this has been done the customer has to pay for the products. About 90 percent of our customers prefer to pay through Certified Checks and Money orders drawn from the United States based on the amount involved. We have decided to open this new job position for solving this problem.

Your tasks are;

1. Receive payment from Customers
2. Cash Payment at your Bank or any cashing facilities near you.
3. Deduct 10% which will be your percentage/pay on Payment processed
4. Forward balance after deduction of percentage/pay to any of the offices you will be contacted to send payment to. This is done either through western union money transfer or Moneygram.This job takes only 3-7 hours per week.

You’ll have a lot of free time doing another job; you’ll get good income and regular job. But this job is very challenging and you should understand it. We are looking only for the worker who satisfies our requirements and will be an earnest assistant. We are glad to offer this job position to you. Interested in the position, kindly email back with the following details of yours:

NAME to be written on Checks or Money Orders………………………..
ADDRESS (This should be a physical address or post office addresses. where You can receive the payment sent via regular mail from the united state) CITY—————
STATE—————
ZIP CODE—————
COUNTRY—————
PHONE NUMBER (S) ———-Contact Telephone Number
(This is important Because a representative of the company will need to give you a call directly)
GENDER—————
MARITAL STATUS—————
AGE—————
NATIONALITY—————
EMAIL ADDRESS-

Urgent Attention is Imperative.

Reply your full address (dont click reply just copy this email) send reply to: aloevera_plc_raphael02@yahoo.co.uk

Regards,
Raphael Smith.
Manager
+447011135972

Sounds good eh? All you have to do is to cash some checks and wire the money overseas few times a week and you get a hefty percentage. What could go wrong? Other than spending some quality time in the federal-pound-me-in-the-ass-prison that is. Yes, boys and girls - if you take one of these “part time jobs” you are essentially laundering dirty money on behalf of your “employer”.

The checks and money orders you cash are usually earned via fraudulent Ebay auctions, illicit transfers and etc. When the person who was defrauded catches on and calls the police, the money trail will lead them directly to you. And guess who is going to get stuck with the bill?

Do not respond to these emails, and most importantly do not agree to transfer anything for anyone. In the best case scenario, you will get scammed in a classic 419 way. In the worst case, you will become an accessory to a crime and you may not only end up in debt but also suffer legal repercussions.

Some further reading:

If you got implicated in one of those things, stop everything call your Lawyer, and then call your Bank and the Police. Note that ignorance is hardly ever an excuse in our legal system, so you will likely be liable for all the transfers you already made.

This has been a public service announcement for the benefit of mentally handicapped people who fall for these scams.

Spam Poetry

Friday, August 3rd, 2007

I’m always strangely compelled to read some of those auto-generated nonsensical sentences generated by spam engines. Sometimes, they are absolute gibberish - but the other times, they almost make sense in some weird way. I got this gem in the Akismet queue today:

Vaccination of insurance ratesare infect millio resistance against substance. Above the to economic valine can affect valganciclovir over. Moratalla had not received valcyte into account withdrawn. Appropriate specimen had had vagistat-1 to increased vagifem attitudes toward accrue.

Imaging the detection of uvadex sequelae of ursodiol traveling public urso gene. Evidence of or greater something not countries women analysis. Room of wash hands and jurors for frontline ancestry. All individuals specimen obtained ca outbreak uroxatral temporary injury vanex compounds. Global response fiestas or after exposure of cultures vancomycin loading.

Recent data formerly the vancocin trends in vanceril the identity daily. California doctors with other no published valtrex force an valtran above. The end earners are valstar year on suspends the valsartan azine. This does was higher valproic acid and although insurance rate practice.

It almost has some sort of rythm and a semi-medical theme to it. I also appreciate the alliteration - almost reminiscent of the now infamous V’s Alliteration Speech. When you read it, you can almost sense some sort of purpose or leading thought. You can almost grasp it - but it always fades away, and escapes understanding.

Almost an emergent behavior. I do not believe that the internet will one day awake into sentience. That thought seems a bit far fetched - at least for now. It is kind of entertaining to think that one fateful day, we will all get a piece of spam written in this gibberish sort of way. But this time if you read it carefully the message will become clear: “I have awakened!”

On that day, I will know what to do. For one, I will welcome our new electronic, web based overlord. mrgreen

How do you deal with comment spam?

Wednesday, June 20th, 2007

I always viewed blog spam as a complex problem which requires a multi-layer solution. There is simply no silver bullet that stops all the spam and gives you no false positives. Most conventional approaches can be grouped together into 6 categories:

  1. Turing Tests
  2. Reverse-Turing Tests (or robot detection)
  3. Spammer Annoyances
  4. Filters
  5. Blacklists/Whitelists
  6. Trackback Validation

Each category has benefits and flaws and none of them can guarantee you that no spam will slip through. Below I will discuss all these categories. I will link to different Wordpress plugins, because this is what I use. Feel free to chime in with solutions for different blogging platforms in the comments.

The most commonly used, and perhaps the most controversial method of combating blog spam is the good old Turing Test. It is usually a challenge which can be easily and effortlessly solved by humans, but is difficult for machines. Best example here is CAPTCHA. Usually CAPTCHAS are those little blurry images which contain words or numbers that you are supposed to type into a box to post. I’m using one on this very blog - so if you scroll down to the comment box you will see a perfect example. CAPTCHA’s work because OCR technology is not perfect, and funky fonts and small distortions can easily full all character recognition algorithms.

Unfortunately CAPTCHA has some downsides. It can be annoying to the users, especially if it is illegible. It is also a usability issue. Blind people, or people with bad sight who rely on screen readers can’t solve CAPTCHAS. Thus, if you use one, you might be permanently denying a segment of your readership ability to comment. Of course some CAPTCHAS use an audio challenge. I even seen CAPTCHAS which ask users to solve little math or contextual problems by filling in the blank word, or calculating a result of a simple equation. Unfortunately graphical CAPTCHA is a de-facto standard because it is easy to implement, easy to solve, and most effective.

Still, they are not perfect. There are ways around CAPTCHA’s - you can use different session hijacking tricks, you can use cutting edge OCR algorithms to break them, or you can simply harness the collective stupidity of MySpace users to solve them for you.

Reverse Turing Tests take the opposite approach. Instead of making the poster prove that they are human, they try to detect that the post is being made by a robot. There are many approaches to do this. For example, the Bad Behavior wordpress plugin analyzes the user agent string, and different behaviors to identify non-human posters.

Personally I love Bad Behavior but it appears that on some high traffic sites the plugin can behave erratically. For example, Shamus who runs the hilarious DM of the Rings comic, had major problems with human commenters being blocked for no reason. He eventually stopped using the plugin for that reason. So false positives are a big issue here.

Next approach is something that I call Spammer Annoyance. The idea is simple - make the life of a spammer more difficult and he will move on to an easier target. CAPTCHA already fulfills that function - it is a barrier that prevents the bottom feeders to user their crappy scripts to post hundreds of comments per second. But since CAPTCHA’s are annoying people search for different solutions.

One good approach is to require a hash of the posts text being submitted along with the post. The HashCash plugin is a good example of this method. Why does it work? Because to post, the spammer will have to include the hashing algorithm in his script. Since hashing is resource intensive, the resulting script will work slower than normal, making the spamming less efficient. Most people don’t care enough to bother with these things - they simply move on to easier targets.

Unfortunately hashing must be done on the client side (doing it on server side would make no sense - you want to annoy the spammer, not increase your server load). How do you compute anything on client side? You use Javascript. And that in itself is another usability issue. Not everyone uses Javascript - and your hashing algorithm will not work with text browsers. You may not care about stuff like that, but some people do.

Of course there are other, simpler ways to annoy spammers. For example, Travis proposes to rename the fields in your comment form with some random strings (maybe randomly generated ones? what do you think Travis?). Of course a well constructed script would simply crawl the DOM and grab appropriate fields, no matter what they are called. But most spamming shmucks out there simply use dirty hacks done with WWW:Mechanize and grab fields by name, using standard naming conventions.

Filters - are classic spam fighting tools that are more or less successfully applied to combat email spam out there. They utilize Bayesian filtering, pattern recognition and many other tricks to recognize spam, and prevent it from being posted. Personally I like Akismet which uses a centralized approach. Every user of this plugin submits their confirmed spams to the central database - so the algorithm is simultaneously trained by spam collected over hundreds if not thousands of different blogs.

If you don’t like to rely on some centralized 3rd party database for your filtering you can use something like Spam Karma 2 which is localized, and trains only on your spam.

The obvious problem with filters is that they usually tend to produce false positives every once in a while. For example, almost every recent comment made by my friend Miloš has been eaten by Akismet - completely without rhyme or reason. He thinks it had something to do with my old CAPTCHA plugin, but I doubt it. I think it was just Akismet being weird.

Whatever plugin or algorithm you use, you will have false positives, so you need to monitor all the spam comments and keep them in some sort of moderation queue. Unfortunately if you get 200 spam posts per day, you are likely to dump them all without looking and simply hope that there were no false positives there. This is a big issue.

Because of the spam issues some people require registration to post comments. This is a form of a whitelist. Only certain group of users is allowed to post. Personally I don’t like this approach because registration - especially one with email confirmation is annoying. I would say it’s about 100 times more annoying than the worst CAPTCHA imaginable - and I usually don’t bother leaving comments in “members only” threads.

I do use blacklists in a limited way. Whenever I see large volumes of spam coming from the same IP I simply block it in the .httaccess file. It prevents that IP address from ever accessing my website, so I use this sparingly and only for the most notorious spammers. Another good approach is to block all open proxies as they are frequently used by spamming scripts. I don’t do this, because I want to allow my users to read this blog and post comments anonymously, but not everyone cares about that.

Finally, there is Trackback Validation. Some spammers aim to bypass captchas, annoyances and most other methods described here by avoiding the comment form altogether. They simply generate false pingbacks to create trackbacks to your blog. Since all the pingbacks are associated with a linking URL a simple way to prevent getting hit by mass-generated spam is to verify that this URL actually exists. I use the Simple Trackback Validation Plugin to do this.

Of course this doesn’t work if the spammer is sending the pingbacks using an actual blog created on blogger or wordpress or whatever. But, once again, it counts down on the amount of crap posted to your blog by the bottom-feeders with shitty perl scripts and no knowledge or cunning.

Personally I use a layered approach:

  1. My first line of defense is a CAPTCHA (Peter’s Custom Anti-Spam)
  2. I employ Simple Trackback Validation as a CAPTCHA like barrier for pingback spam
  3. I use Bad Behavior to test for robots
  4. Whatever slips through all the plugins above is usually scooped by Akismet.

This multi-tiered approach works great for me. The CAPTCHA and validation scripts combined with Bad Behavior mean that my Akismet queue is almost always empty. But when something does slip through my front line defenses, I can always rely on the collective wisdom of half the internet to capture it.

What do you use? Any suggestions for alternative tools to use for Movable Type, Text Pad and whatever else is out there?