Archive for June, 2007

Sometimes Users do know what they want

Monday, June 25th, 2007

When gathering specs, the first rule to keep in mind is:

Users don’t know what the hell they want

Or rather, what they think they want, is usually not what they need. This is why it is important to study the process you are trying to automate, rather than simply ask the users what will the software need to do for them. Chances are that they will concentrate on superficial features, and omit big chunks of crucial business logic. Also, most users think in what I call Flat Table or Excel mode.

When the users describe data they collect and manage, most of us will immediately try to categorize them into entities with relationships. Some people do it in their head, others (like me) break out a notepad and start drawing E/R or UML diagrams. This way of thinking is actually the natural, correct way we should think about data. Unfortunately, most human beings do not see data like that because they were never taught how to store information in databases. For most people the primary tool for storing and tabulating large quantities of data is Excel - which is essentially a one large flat table. Users simply don’t know that there is a different way, they don’t know about normalization, and they do not realize that data redundancy is bad.

Database design is just a small example - but there are more pitfalls like this at almost every stage of development. This is why you sit down with the users, and watch what they do instead of just asking them what they want.

Of course when you are implementing something new - something that is not part of their current work process you are back to asking questions. And in those cases, sometimes users do know what they want.

Recently I fell in the trap of over-engineering a simple solution. My users wanted to track information about their clients. It was about 30 fields that could be easily divided into two entities - or three. Part of the data was purely informative - addresses, phone numbers, and names of contacts, and directors of client companies - both lenders, who would request audits, and borrowers who would be audited. Then there was bunch of numbers collected during the actual audit at the borrower site.

My boss exhibited the classic Excel approach here - he wanted to dump everything into a single table. But I knew better. I could easily see that the audits are done on a recurring basis, so for each lender and borrower pair you could have multiple audits performed at different times. So I designed my database with 3 entities - lender, borrower and audit, where each audit is associated with exactly one lender and one borrower, and both lenders and borrowers can have multiple audits. It made perfect sense, and as an added benefit, it allowed you to track the progression of our numbers across several audits.

But of course this progression is tracked in the actual audit reports - in much greater detail and scope. What my boss really wanted was to have a database of clients, periodically updated with the latest data about their earnings and loan details. I found this out during a demo of the almost-completed project. I also found out that the numeric data should be extracted directly from Excel work-papers for a given audit, and that it should overwrite the previous data so that we store only the latest info.

So now I have a choice - I can either keep my current schema, and just deal with the fact that borrower and audit will always have a pointless 1-1 relationship that will require me to do unnecessary joins, and update data in multiple steps, or just merge the two tables and re-write much of the existing code where necessary.

Either way, I would have been better of listening to the users in the first place. Because, as it turns out, they sometimes do know exactly what they want. So I suggest that you make the second rule of gathering specifications to be:

You don’t know what the hell users want either.

You may think that you know users want or need - but chances are you will be wrong. The specifications must come from the users, and reflect their process, and their needs - not our educated guess on what the process should look like. Because ultimately, they are the ones that will be using this software - and despite the fact they are usually unable to communicate it well, they do have some sort of end-product in mind when they ask you to develop something.

Running Remote Desktop is Faster than Running Application Locally

Sunday, June 24th, 2007

Recently I found out that it is sometimes faster to use Remote Desktop than to run an application locally, and simply having it access files on a remote network share. Let me give you some background.

My company uses Quickbooks for some accounting stuff. I don’t really know what they use it for, and I don’t really care. All I know is that there is a Quickbooks file sitting on a network share, which is accessed by 2-3 people in the office. One of our employees who usually works from home was assigned some responsibilities that require using Quickbooks. She would not be coming into the office often however, so we had to figure out a way to set her up so that she could work from the remote site.

My initial idea was simple: make her VPN into the office and mount the network share as a remote drive. Then you can use Quickbooks to access the file the same way all the people in the office do - only the data will flow over the internet via the VPN tunnel, instead of just the local network.

It turned out that this was a horrible idea. The way Quickbooks works with it’s files is beyond retarded - apparently this application was never designed to be used this way. The slow file access would mean that the application’s interface would seize-up for seconds at a time, and remain unresponsive until all the data was downloaded. Clicking anywhere in the interface as it was pulling in data would lead to the dreaded “Not Responding” message which usually prompts users to kill the application. Of course if you would wait 10-20 seconds the UI would unfreeze itself and become responsive again. But that kind of behavior makes for piss poor user experience.

So I decided on an alternative approach. We had a spare workstation in the office that no one really used. It was sitting in a cube usually occupied by summer interns if we get any. It didn’t look like we were getting any interns this summer so I grabbed that machine and set it up to allow Remote Desktop access.

So now our telecommuter has to vpn in, then connect to the spare workstation via Remote Desktop, and launch Quickbooks on the remote machine. Amazingly enough this method provides much smoother user experience. Rdesktop is really good at refreshing the screen in a seamless and extremely responsive manner even on slow over-the-internet connections. Quickbooks remains responsive because working with files on the local network has never been a problem.

So here you go - sometimes it’s better to give your telecommuter access to a local box at the office, rather than have them run some application locally.

Casino Royale

Sunday, June 24th, 2007

I finally got a chance to see Casino Royale the other day. I’m not a big Bond fan per se, but I essentially grew up watching these movies so for better, or for worse I have certain fondness for flicks featuring agent 007. That said, I could hardly stomach any of the recent Bond garbage.

I can say I like what they were trying to do with this movie. Instead of having some assface failing miserably at trying to be Sean Connery they went and created a brand new Bond with 80% less cheese and and a hint of awesome. They essentially rebuilt him from scratch (apparently they had the technology - or so I’m told). Daniel Craig makes Pierce Brosnan look like a batnipple - must have seemed like a good idea at the time, and someone green-lighted it, but seriously.

So we have this brand new Bond, who can kick more ass than the last 3 or 4 Bonds ever even saw kicked in their lives. He can do realistic fight scenes, and even some parkour. Yes, he is actually chasing some dude through the construction site - he doesn’t whip out an invisible jetpack from his wristwatch. And since this s a prequel of sorts, this Bond is still fresh, ambitious and hungry. He has a big ego, and a lot to prove. He also has a problem with authority, and regularly pisses off his boss, and gets in trouble for being hot-headed.

With this kind of setup, there is about a billion exciting things you could make him do. Playing poker is not one of them. I have nothing against having a game of poker in a Bond movie. It kinda fits right in. But, this movie is all about a poker game. Bond’s mission is essentially “go play poker against the bad guy and win”. Apparently during the production of this film, someone decided that it will really be a good idea to have this awesome new Bond playing cards throughout the whole movie. I don’t know about you, but last time I checked it is really hard to kick ass when you are sitting at a poker table. And I don’t care who you are, but kicking someone in the face delivers more pwn per pound than calling a bluff at a high stakes poker game. Sorry but if I really wanted to watch people playing cards I would just flip to ESPN and then go fucking kill myself in the face.

So instead of kicking ass and taking names, agent 007 is tossing around poker chips through half of the movie. I would probably fall asleep if he didn’t take short breaks from playing cards to kill some people in the stairway, get poisoned and almost die, and do some other assorted Bond like stuff.

Then to add insult to the injury, we have that retarded scene where Craig tells Eva Green that “whatever is left of him belongs to her” or something among those lines. I can’t recall the exact lines, but I remember thinking that his manhood was seriously damaged in the torture scene just before, because that crap was about the gayest shit that a straight man could say to a woman. Then, instead of sleeping with her and leaving, he tells her he loves her, and quits his job. WTF? This is not James Bond! This is fucking Lifetime Chanel shit! It just didn’t make any sense to me. Bond either really suffer some massive testicular damage, or he just got whipped like a schoolboy by a chick that is not even all that hot, and was kinda annoying me throughout the movie.

And then it clicks in. Since the movie is not ending, you just know that the girl will either get brutally killed prompting Bond to go back to work seeking revenge, or that she will betray him and break his heart just after she is done completely emasculating him. They were just setting it up so that he goes back to work either totally pissed off, or heart broken - having learned a lesson that getting to attached to someone when you’re a 00 agent is probably not a good idea. Bond girlfriends tend to die easily, and get kidnapped even easier and rescuing them is a pain in the ass.

All in all, it was a decent Bond movie. It would have been better if it wasn’t for all the poker playing. It also had surprisingly little of gratuitous sex and nakid boobs - these things are kindoff Bond gold standard by now. Strangely they decided to skimp out on that part, and add gratuitous poker pr0n instead. To each his own I guess.

Oh, and btw - I know this movie is not new. This is why I’m not even gonna bother with the hReview rating. This is why you’re not seeing any stars underneath this review. If you really want to see some stars, go outside and look at the sky. P

Friends Don’t Let Friends Develop Software Alone

Thursday, June 21st, 2007

Given a chance, would you rather work alone, or as part of a team? Right now I’m the sole software developer employed by my company - and I can tell you there is nothing I would like more than to have someone here to kick my ass, scrutinize my code, scold me when I cut corners, and advise me when I’m in a bind. Developing software alone is not a good idea - always try to work as part of a team, or open source your project and seek outside input.

Listen to this man because he speaks the truth:

In the future, if a company offers me a job and tells me I’ll be the sole developer, I’ll walk away from the offer. I’ll never do this again.

Some folks have claimed that it presents the great opportunity to establish your own process. In my experience, there is no process in a team of one. There’s nothing in place to hold off the torrents of work that come your way. There’s no one to correct you when the urge to gold-plate the code comes along. There’s no one to review your code. There’s no one to ensure that your code is checked in on time, labeled properly, unit tested regularly. There’s no one to ensure that you’re following a coding standard. There’s no one to monitor your timeliness on defect correction. There’s no one to verify that you’re not just marking defects as “not reproducible” when, in fact, they are. There’s no one to double-check your estimates, and call you on it when you’re just yanking something out of your ass.

There’s no one to pick up the slack when you’re sick, or away on a business trip. There’s no one to help out when you’re overworked, sidetracked with phone calls, pointless meetings, and menial tasks that someone springs on you at the last minute and absolutely must be done right now. There’s no one to bounce ideas off of, no one to help you figure your way out of a bind, no one to collaborate with on designs, architectures or technologies. You’re working in a vacuum. And in a vacuum, no one can hear you scream.

(…)

If anyone’s reading this, let this be a lesson to you. Think hard before you accept a job as the sole developer at a company. It’s a whole new kind of hell. If given the chance, take the job working with other developers, where you can at least work with others who can mentor you and help you develop your skill set, and keep you abreast of current technology.

I agree with everything above 100% - I can feel this man’s pain, I can totally understand his position - because it is the same one as mine. Best advice I can give to anyone who is looking for a job right now, is to stay away from solo projects. They might seem great, but slogging it alone is a masturbatory experience which doesn’t help you grow as a programmer.

Jeff Artwood seems to agree as well:

Working alone means complete control over a software project, wielding ultimate power over every decision. But working on a software project all by yourself, instead of being empowering, is paradoxically debilitating. It’s a shifting mirage that offers the tantalizing promise of relief, while somehow leaving you thirstier and weaker than you started.

Not only does it make you weaker, but it also may mean that your product will have some weird birth defects - a sure sign of mental inbreeding. Every single one of us has gets those oddball ideas which initially seem awesome and brilliant but turn out to be embarrassingly stupid and difficult to maintain. If you are working in a team, someone is bound to look at your code and call you on your bullshit sooner or later. But if you work alone, these eccentric little nuggets of pure crazy may slip through to production code before you realize how bad they really are.

Of course it’s not all bad. You get full creative control over the project, you are free to use the technology you want without the need to justify it to anyone, you get to set your own schedules, you have the job security angle covered, and no one can short-change the value of your work.

And working in a team can have big downsides too - for example, the project lead might be an idiot, and you may end up working with the “Briliant Paula Bean”.

Still, I think that having someone else around to bounce your ideas off, and call you on your bullshit is extremely valuable.

What do you think?

How do you deal with comment spam?

Wednesday, June 20th, 2007

I always viewed blog spam as a complex problem which requires a multi-layer solution. There is simply no silver bullet that stops all the spam and gives you no false positives. Most conventional approaches can be grouped together into 6 categories:

  1. Turing Tests
  2. Reverse-Turing Tests (or robot detection)
  3. Spammer Annoyances
  4. Filters
  5. Blacklists/Whitelists
  6. Trackback Validation

Each category has benefits and flaws and none of them can guarantee you that no spam will slip through. Below I will discuss all these categories. I will link to different Wordpress plugins, because this is what I use. Feel free to chime in with solutions for different blogging platforms in the comments.

The most commonly used, and perhaps the most controversial method of combating blog spam is the good old Turing Test. It is usually a challenge which can be easily and effortlessly solved by humans, but is difficult for machines. Best example here is CAPTCHA. Usually CAPTCHAS are those little blurry images which contain words or numbers that you are supposed to type into a box to post. I’m using one on this very blog - so if you scroll down to the comment box you will see a perfect example. CAPTCHA’s work because OCR technology is not perfect, and funky fonts and small distortions can easily full all character recognition algorithms.

Unfortunately CAPTCHA has some downsides. It can be annoying to the users, especially if it is illegible. It is also a usability issue. Blind people, or people with bad sight who rely on screen readers can’t solve CAPTCHAS. Thus, if you use one, you might be permanently denying a segment of your readership ability to comment. Of course some CAPTCHAS use an audio challenge. I even seen CAPTCHAS which ask users to solve little math or contextual problems by filling in the blank word, or calculating a result of a simple equation. Unfortunately graphical CAPTCHA is a de-facto standard because it is easy to implement, easy to solve, and most effective.

Still, they are not perfect. There are ways around CAPTCHA’s - you can use different session hijacking tricks, you can use cutting edge OCR algorithms to break them, or you can simply harness the collective stupidity of MySpace users to solve them for you.

Reverse Turing Tests take the opposite approach. Instead of making the poster prove that they are human, they try to detect that the post is being made by a robot. There are many approaches to do this. For example, the Bad Behavior wordpress plugin analyzes the user agent string, and different behaviors to identify non-human posters.

Personally I love Bad Behavior but it appears that on some high traffic sites the plugin can behave erratically. For example, Shamus who runs the hilarious DM of the Rings comic, had major problems with human commenters being blocked for no reason. He eventually stopped using the plugin for that reason. So false positives are a big issue here.

Next approach is something that I call Spammer Annoyance. The idea is simple - make the life of a spammer more difficult and he will move on to an easier target. CAPTCHA already fulfills that function - it is a barrier that prevents the bottom feeders to user their crappy scripts to post hundreds of comments per second. But since CAPTCHA’s are annoying people search for different solutions.

One good approach is to require a hash of the posts text being submitted along with the post. The HashCash plugin is a good example of this method. Why does it work? Because to post, the spammer will have to include the hashing algorithm in his script. Since hashing is resource intensive, the resulting script will work slower than normal, making the spamming less efficient. Most people don’t care enough to bother with these things - they simply move on to easier targets.

Unfortunately hashing must be done on the client side (doing it on server side would make no sense - you want to annoy the spammer, not increase your server load). How do you compute anything on client side? You use Javascript. And that in itself is another usability issue. Not everyone uses Javascript - and your hashing algorithm will not work with text browsers. You may not care about stuff like that, but some people do.

Of course there are other, simpler ways to annoy spammers. For example, Travis proposes to rename the fields in your comment form with some random strings (maybe randomly generated ones? what do you think Travis?). Of course a well constructed script would simply crawl the DOM and grab appropriate fields, no matter what they are called. But most spamming shmucks out there simply use dirty hacks done with WWW:Mechanize and grab fields by name, using standard naming conventions.

Filters - are classic spam fighting tools that are more or less successfully applied to combat email spam out there. They utilize Bayesian filtering, pattern recognition and many other tricks to recognize spam, and prevent it from being posted. Personally I like Akismet which uses a centralized approach. Every user of this plugin submits their confirmed spams to the central database - so the algorithm is simultaneously trained by spam collected over hundreds if not thousands of different blogs.

If you don’t like to rely on some centralized 3rd party database for your filtering you can use something like Spam Karma 2 which is localized, and trains only on your spam.

The obvious problem with filters is that they usually tend to produce false positives every once in a while. For example, almost every recent comment made by my friend Miloš has been eaten by Akismet - completely without rhyme or reason. He thinks it had something to do with my old CAPTCHA plugin, but I doubt it. I think it was just Akismet being weird.

Whatever plugin or algorithm you use, you will have false positives, so you need to monitor all the spam comments and keep them in some sort of moderation queue. Unfortunately if you get 200 spam posts per day, you are likely to dump them all without looking and simply hope that there were no false positives there. This is a big issue.

Because of the spam issues some people require registration to post comments. This is a form of a whitelist. Only certain group of users is allowed to post. Personally I don’t like this approach because registration - especially one with email confirmation is annoying. I would say it’s about 100 times more annoying than the worst CAPTCHA imaginable - and I usually don’t bother leaving comments in “members only” threads.

I do use blacklists in a limited way. Whenever I see large volumes of spam coming from the same IP I simply block it in the .httaccess file. It prevents that IP address from ever accessing my website, so I use this sparingly and only for the most notorious spammers. Another good approach is to block all open proxies as they are frequently used by spamming scripts. I don’t do this, because I want to allow my users to read this blog and post comments anonymously, but not everyone cares about that.

Finally, there is Trackback Validation. Some spammers aim to bypass captchas, annoyances and most other methods described here by avoiding the comment form altogether. They simply generate false pingbacks to create trackbacks to your blog. Since all the pingbacks are associated with a linking URL a simple way to prevent getting hit by mass-generated spam is to verify that this URL actually exists. I use the Simple Trackback Validation Plugin to do this.

Of course this doesn’t work if the spammer is sending the pingbacks using an actual blog created on blogger or wordpress or whatever. But, once again, it counts down on the amount of crap posted to your blog by the bottom-feeders with shitty perl scripts and no knowledge or cunning.

Personally I use a layered approach:

  1. My first line of defense is a CAPTCHA (Peter’s Custom Anti-Spam)
  2. I employ Simple Trackback Validation as a CAPTCHA like barrier for pingback spam
  3. I use Bad Behavior to test for robots
  4. Whatever slips through all the plugins above is usually scooped by Akismet.

This multi-tiered approach works great for me. The CAPTCHA and validation scripts combined with Bad Behavior mean that my Akismet queue is almost always empty. But when something does slip through my front line defenses, I can always rely on the collective wisdom of half the internet to capture it.

What do you use? Any suggestions for alternative tools to use for Movable Type, Text Pad and whatever else is out there?