Archive for the 'blogging' Category

Taking Weekends Off

Saturday, March 29th, 2008

Just wanted to drop a quick note here - I won’t be doing Saturday posts anymore. I already gave up on Sunday posting back in January and I mentioned I might go down to a much healthier 5 day schedule. So this is what I’m doing right now.

This doesn’t really mean anything negative. Rather, I noticed that over time I’ve been putting more and more thought, research and time into every post here. More often than not I’m actually trying to write something that’s either meaningful, interesting or worth discussing rather than just posting 3 paragraph rants apropos the newest hot issue found via digg or slashdot. More often than not I’m trying to post original material here rather than reiterating stuff that you can easily find elsewhere. I don’t know if this is gradual improvement in quality, but it does tend to be more time consuming. So the drop in posting frequency is good for my sanity, and will hopefully let me queue up posts, and perhaps even proofread all of them before they go live. P

I don’t think I will drop below 5 days, but perhaps that will turn out to be necessary in the future. For now though, you can expect to see posts appearing here Mon-Fri usually between 11am and noon (although Google reader doesn’t pick it up until few hours later).

Blogs Without Comments

Tuesday, August 28th, 2007

How do you feel about blogs without comments? It seems that lately it is fashionable to say that “comments don’t scale“. I guess Joel Spolsky is probably the most quoted individual who argued against having open comments on your blog. His position is a bit extreme:

When a blog allows comments right below the writer’s post, what you get is a bunch of interesting ideas, carefully constructed, followed by a long spew of noise, filth, and anonymous rubbish that nobody … nobody … would say out loud if they had to take ownership of their words.
(…)
I don’t know how many times I’ve read a brilliant article someone wrote on a blog. By the end of the article, I’m excited, I’m impressed, it was a great article. And then you get the dribble of morbid, meaningless, thoughtless comments.

This is little harsh. I do not think this applies to this blog at all. Most of the comments I get here are insightful, funny and worth reading. Many of them actually complement the post, adding new content that I simply missed. I would argue that comments here generally add value to the post. But then again, my average traffic load here is relatively low and signal to noise ratio is very good.

I totally agree with Jeff Artwood when he says:

I firmly maintain that a blog without comments enabled is not a blog. It’s more like a church pulpit. You preach the word, and the audience passively receives your evangelical message. Straight from God’s lips to their ears. When the sermon is over, the audience shuffles out of the church, inspired for another week. And there’s definitely no question and answer period afterward.

I don’t want to be preaching to my readers. I want a conversation. I want to get to know my readers, and find out what they think on a given subject. We all blog for different reasons, but ultimately we all want people to read our stuff. So having a mechanism that lets your readers give you feedback is really important.

I view comments as a community building tool. Right now I have small group of regulars around here who frequently read and comment on posts, an comment on each others comments. And I think it’s great. I love that we have our small community growing here. And as such we still need to work on some inside jokes, and memes btw. )

I do not believe that you can have an insightful conversation using the “everyone posts on their own blog and links to eachother” methodology. What you get then is bunch of people preaching from their respective soap-boxes and cherry-picking arguments they want to discuss. Furthermore these commentaries are now spread over many websites, with no organized way of jumping from one to the other. If you disabled comments then you probably also do not allow tracebacks. So the only way I can know that someone commented on your article, is to randomly visit their blog. Does that facilitate good discussion? No.

With comments on the other hand, you get chronologically sorted, organized conversation right below the original post. In such setup it is easy to have actual debates with arguments, counter arguments, ripostes and etc. So while comments can be a mindless random drivel, they can also be an insightful discussion.

Not to mention that comments provide me with instant gratification/validation mechanism. When I get 0 comments on a post, I kinda know that no one was particularly interested in that one. And even if they were, they just didn’t have much to say about it. But when a topic sparks a conversation I instantly get that “Oh, people are actually reading this stuff!” feeling. And no amount of looking at the server logs, or website stats can compare with actually reading what people thought about your post.

Of course if you get few hundred comments per post, the nice benefits I outlined above are greatly diminished. It’s easy for discussions to turn into bickering and flame wars, and with high volume of posters it is usually difficult for the blog author to effectively moderate.

Still, we are not without tools to combat crappy comments. Take Slashdot for example - if you brows it with a filter that only shows you the posts moderated above certain threshold you can cut out most of that “noise, filth, and anonymous rubbish” that Spolsky seems to despise so much. Same goes for Digg for example - crappy and unpopular comments get buried and hidden increasing readability of the thread.

Community moderation combined with regular ant-spam measures does work - and it works well enough. All you need to do is to slap something like the Digg inspired Comment Karma plugin onto your blog, and the signal to noise ratio increases instantly.

A blog without comments is like a public panel without a Q/A session. I personally find that comments add value to the original content more often than not. What do you think?

Comments on a Blog:
View Results

What would you rather have - a high traffic blog with comments, even if they tend to be a bit chaotic, or pristine church pulpit blog that allows no comments? Given a choice, I’ll always pick the former over the latter.

How do you deal with comment spam?

Wednesday, June 20th, 2007

I always viewed blog spam as a complex problem which requires a multi-layer solution. There is simply no silver bullet that stops all the spam and gives you no false positives. Most conventional approaches can be grouped together into 6 categories:

  1. Turing Tests
  2. Reverse-Turing Tests (or robot detection)
  3. Spammer Annoyances
  4. Filters
  5. Blacklists/Whitelists
  6. Trackback Validation

Each category has benefits and flaws and none of them can guarantee you that no spam will slip through. Below I will discuss all these categories. I will link to different Wordpress plugins, because this is what I use. Feel free to chime in with solutions for different blogging platforms in the comments.

The most commonly used, and perhaps the most controversial method of combating blog spam is the good old Turing Test. It is usually a challenge which can be easily and effortlessly solved by humans, but is difficult for machines. Best example here is CAPTCHA. Usually CAPTCHAS are those little blurry images which contain words or numbers that you are supposed to type into a box to post. I’m using one on this very blog - so if you scroll down to the comment box you will see a perfect example. CAPTCHA’s work because OCR technology is not perfect, and funky fonts and small distortions can easily full all character recognition algorithms.

Unfortunately CAPTCHA has some downsides. It can be annoying to the users, especially if it is illegible. It is also a usability issue. Blind people, or people with bad sight who rely on screen readers can’t solve CAPTCHAS. Thus, if you use one, you might be permanently denying a segment of your readership ability to comment. Of course some CAPTCHAS use an audio challenge. I even seen CAPTCHAS which ask users to solve little math or contextual problems by filling in the blank word, or calculating a result of a simple equation. Unfortunately graphical CAPTCHA is a de-facto standard because it is easy to implement, easy to solve, and most effective.

Still, they are not perfect. There are ways around CAPTCHA’s - you can use different session hijacking tricks, you can use cutting edge OCR algorithms to break them, or you can simply harness the collective stupidity of MySpace users to solve them for you.

Reverse Turing Tests take the opposite approach. Instead of making the poster prove that they are human, they try to detect that the post is being made by a robot. There are many approaches to do this. For example, the Bad Behavior wordpress plugin analyzes the user agent string, and different behaviors to identify non-human posters.

Personally I love Bad Behavior but it appears that on some high traffic sites the plugin can behave erratically. For example, Shamus who runs the hilarious DM of the Rings comic, had major problems with human commenters being blocked for no reason. He eventually stopped using the plugin for that reason. So false positives are a big issue here.

Next approach is something that I call Spammer Annoyance. The idea is simple - make the life of a spammer more difficult and he will move on to an easier target. CAPTCHA already fulfills that function - it is a barrier that prevents the bottom feeders to user their crappy scripts to post hundreds of comments per second. But since CAPTCHA’s are annoying people search for different solutions.

One good approach is to require a hash of the posts text being submitted along with the post. The HashCash plugin is a good example of this method. Why does it work? Because to post, the spammer will have to include the hashing algorithm in his script. Since hashing is resource intensive, the resulting script will work slower than normal, making the spamming less efficient. Most people don’t care enough to bother with these things - they simply move on to easier targets.

Unfortunately hashing must be done on the client side (doing it on server side would make no sense - you want to annoy the spammer, not increase your server load). How do you compute anything on client side? You use Javascript. And that in itself is another usability issue. Not everyone uses Javascript - and your hashing algorithm will not work with text browsers. You may not care about stuff like that, but some people do.

Of course there are other, simpler ways to annoy spammers. For example, Travis proposes to rename the fields in your comment form with some random strings (maybe randomly generated ones? what do you think Travis?). Of course a well constructed script would simply crawl the DOM and grab appropriate fields, no matter what they are called. But most spamming shmucks out there simply use dirty hacks done with WWW:Mechanize and grab fields by name, using standard naming conventions.

Filters - are classic spam fighting tools that are more or less successfully applied to combat email spam out there. They utilize Bayesian filtering, pattern recognition and many other tricks to recognize spam, and prevent it from being posted. Personally I like Akismet which uses a centralized approach. Every user of this plugin submits their confirmed spams to the central database - so the algorithm is simultaneously trained by spam collected over hundreds if not thousands of different blogs.

If you don’t like to rely on some centralized 3rd party database for your filtering you can use something like Spam Karma 2 which is localized, and trains only on your spam.

The obvious problem with filters is that they usually tend to produce false positives every once in a while. For example, almost every recent comment made by my friend Miloš has been eaten by Akismet - completely without rhyme or reason. He thinks it had something to do with my old CAPTCHA plugin, but I doubt it. I think it was just Akismet being weird.

Whatever plugin or algorithm you use, you will have false positives, so you need to monitor all the spam comments and keep them in some sort of moderation queue. Unfortunately if you get 200 spam posts per day, you are likely to dump them all without looking and simply hope that there were no false positives there. This is a big issue.

Because of the spam issues some people require registration to post comments. This is a form of a whitelist. Only certain group of users is allowed to post. Personally I don’t like this approach because registration - especially one with email confirmation is annoying. I would say it’s about 100 times more annoying than the worst CAPTCHA imaginable - and I usually don’t bother leaving comments in “members only” threads.

I do use blacklists in a limited way. Whenever I see large volumes of spam coming from the same IP I simply block it in the .httaccess file. It prevents that IP address from ever accessing my website, so I use this sparingly and only for the most notorious spammers. Another good approach is to block all open proxies as they are frequently used by spamming scripts. I don’t do this, because I want to allow my users to read this blog and post comments anonymously, but not everyone cares about that.

Finally, there is Trackback Validation. Some spammers aim to bypass captchas, annoyances and most other methods described here by avoiding the comment form altogether. They simply generate false pingbacks to create trackbacks to your blog. Since all the pingbacks are associated with a linking URL a simple way to prevent getting hit by mass-generated spam is to verify that this URL actually exists. I use the Simple Trackback Validation Plugin to do this.

Of course this doesn’t work if the spammer is sending the pingbacks using an actual blog created on blogger or wordpress or whatever. But, once again, it counts down on the amount of crap posted to your blog by the bottom-feeders with shitty perl scripts and no knowledge or cunning.

Personally I use a layered approach:

  1. My first line of defense is a CAPTCHA (Peter’s Custom Anti-Spam)
  2. I employ Simple Trackback Validation as a CAPTCHA like barrier for pingback spam
  3. I use Bad Behavior to test for robots
  4. Whatever slips through all the plugins above is usually scooped by Akismet.

This multi-tiered approach works great for me. The CAPTCHA and validation scripts combined with Bad Behavior mean that my Akismet queue is almost always empty. But when something does slip through my front line defenses, I can always rely on the collective wisdom of half the internet to capture it.

What do you use? Any suggestions for alternative tools to use for Movable Type, Text Pad and whatever else is out there?

Running without CAPTCHA Experiment

Tuesday, June 19th, 2007

I decided to disable the CAPTCHA in the comments section - at least for a few days. I haven’t seen much spam here lately, and I feel that Akismet and Bad Behavior have been doing an excellent job scooping up all the unwanted crap so far so I decided to do a little experiment. I will keep the CAPTCHA off for few days and see if the level of spam changes.

If I get hammered with spam, I will simply re-enable it. If I don’t, I might just keep it off for good.

Also, if ever got weird error messages when posting here, I apologize. I think these errors might have been triggered by the code in the CAPTCHA/comment preview plugin. Or not. This might help me track down the source of these issues.

What do you think? Is this a good idea, or am I in for a world of pain?

Bloglines Marked All Live Journal Feeds as Broken

Thursday, May 3rd, 2007

I noticed today that all the Live Journal feeds I subscribe to via Bloglines have a red exclamation mark next to them. This essentially means that Bloglines can’t read the feed, and thus no updates will show up for it until the feed issues are resolved.

I have been noticing more and more of those marks lately but never really thought much of them. But when all the LJ feeds went wonky at the same time I figured there might be two possibilities here:

  1. LJ fucked up it’s feeds
  2. Bloglines is not reading the feeds correctly

I exported my OPML and imported it in Google Reader just to see if I get similar behavior. Nope. Reader didn’t show any errors, and I was able to pick up the recent updates. I like Bloglines. I’m used to it. But Google Reader has some nice features too. Is it time to switch?

Update 05/03/2007 12:39:53 PM

Two quick updates:

  1. The LJ feeds seem to be working properly today
  2. The issue with updating them yesterday was apparently caused by LJ banning the bloglines feed crawler

See the comment from a Bloglines representative for more details.

Update 05/08/2007 03:54:44 PM

The LJ feeds are broken again. WTF people? Bloglines must really be itching to loose users to Google Reader. P