Archive for the ‘software engineering’ Category

On Optimization

Tuesday, July 21st, 2009

Here is an interesting story that I got from one of the old-timers in our industry. The guy who told it to me used to be a COBOL developer back in the day when Cobol was the “bleeding edge” technology. He is no longer working in the field nowadays, and he sort of lost track of the technology train.

He told me that he recently was working on some deal with the first company that hired him out of college. They gave him a brief tour and talked about the upcoming upgrade of their billing/accounting/everything else system. Apparently they were finally moving from their old cryptic, COBOL application to a brand new one written in ASP.NET. Few more prodding questions confirmed his suspicion. The COBOL app was the same exact system he helped to design 20 something years ago. This was some of the worst, buggiest and most unreadable code he has ever wrote in his life (being green and fresh out of school) and yet it was still in operation.

That’s not all though. He asked them how come they have never replaced the system with something more modern up until now. It turns out that they had. This was actually the third attempt at migration to a new technology. Previous 2 have failed miserably. Their development teams did produce viable code which fared pretty well in small scale tests. But when they actually tried to run full scale operations, the ASP app would just grind to a halt.

I chuckled. I was not surprised. “Back in the day, people knew how to write code. We are so spoiled by the Moores Law that we forgot how to do it these days”, I mused. He nodded in agreement.

You see, the COBOL system processed millions of records every day. Even though it was old, and running on an ancient hardware each batch would only take seconds to crank out. It was stable, reliable and the COBOL old-timers optimized the shit out of it over the last 20 years using every known trick in the book. The app was maintained by a shrinking cadre of wise bearded fellows who scoffed at new fangled concepts such as objects, polymorphism or encapsulation. They, however knew exactly how to shave few seconds off an operation by writing intricate spaghetti code.

The ASP and then ASP.NET code on the other hand was written by groups of greenhorns fresh out of college. They were idealistic, and excited about their project. They wrote great object oriented code, split into clearly defined modules. They leveraged open source libraries. More importantly their code was run on top of the line servers – best the money could buy.

And yet, each time they put it to real life stress test the ancient COBOL kludge would run laps around them. It would process 10 thousand records before their code even finished initializing. What took the old app 10 minutes would take 2 days on the .NET platform despite running on a hardware that was at least 10 times as fast.

They could not take such a major hit in speed as it would hurt their productivity. So twice they have shelved the project waiting until hardware finally catches up. Yep, they were waiting for hardware to catch up so that they could even hope to match the performance of a 20 year old application. Every once in a while they would brush of the code install it on new, juiced up servers and have another crack at it. They didn’t bother rewriting it, because so much money was sunk into it in the first place. So each consecutive team assigned to it would just do some minor re-factoring. This time however they were sure of success. The initial tests revealed that the newest incarnation of the ASP app was only 30% slower than the COBOL app which was considered an overwhelming success.

Non only that, they explained, but in 2-3 years the hardware will become twice as fast, which means that they might actually be able to match or even exceed the COBOL performance. Imagine that.

I’m not taking a crack at .NET or modern programming paradigms here. There is nothing wrong with either. Someone could argue that choosing to write this code in C would be a better idea from optimization point of view. Then again, modern JIT compilers can often optimize the executed code much better at run time than a C guru could ever do it by hand.

In fact, there is no for why COBOL code running on old hardware would really outperform .NET running on a modern rig. None besides crappy coding on the .NET platform. Back in the day when memory and disk space was scarce and each CPU cycle was important people knew how to optimize code. They knew how to write programs that will scale well, under limited resources. They had to learn these tricks because there was simply no way to shove hundreds of megabytes of data items in and out of memory like it is today. When they wrote code, they had to think about how it will work with large data sets.

Over the years we have sort of become lazy and complacent. I’m as guilty of it as everyone else. When I write code I hardly ever consider large data sets. I just make sure the important columns in the database are indexed, and that my query is not retarded. I hardly ever look at the actual logic within my program. I write deeply nested loops without thinking about scalability. It became sort of a pathology.

I became painfully aware of this while working on my thesis. When I was forced to do operations on big data matrices over and over again, I had to go down to basics – get rid of fancy objects, iterators and all that jazz, and just use simple loops and arrays. And even then I was struggling because no one ever taught us how to really approach practical optimization. I mean we talked about it in theory – and we were taught about algorithms. But no one really bothered to teach us practical things such as good ways to identify bottlenecks in your code, or practical optimization tricks.

I guess everyone assumed you will pick up stuff like that on your own. Or it will get drummed into you at your first job. Or people simply forget about it. After all, parallelism is the sexy thing to talk about these days. So instead of finding bottlenecks and eliminating them let’s parallelize the code and make it run on a cluster. Which is a valid approach, but not in every situation. Every once in a while you run into situation like the one I described above. There is an ancient COBOL app running on an ancient hardware – and it cranks out results faster than your code written in a modern language, running on a modern computer.

Does it means that we lost an edge? Does it mean we forgot how to write efficient code that will run fast even with very limited resources. Not really. There are still people out there who can do this sort of thing well. And of course over optimization can be harmful too.

This is just something to think about. Situations like this one happen in real life, and are quite ironic. I wonder how did the .NET development team justify their poor performance to the management.

Small Programming Projects

Thursday, April 30th, 2009

Let me know if this has ever happened to you. Your boss walks up to you and tells you he has a tiny little, itsy, bitsy project for you. He wants you to build a small online application. Nay, a small online form – just a single form with some database back end. Nothing more. A single page that would allow the employees could submit their TPS reports (or whatever) online. It’s only going to run on the intranet, it will never face the web and there is only 4 people who will really be using it so there is no point worrying about authentication, user management and all that stuff – just do the bare bones minimum necessary to tell the users apart, and keep them from overwriting each others data.

You implement it, and everyone loves it. The boss pats himself on the back for clever use of technology and tells you to add a tiny little feature to make the thing accept quarterly evaluations as well, and extend the user base to like 20 people. That’s about it. And you shouldn’t really worry about expanding it. It’s not going to get any bigger than this. It’s just a tiny change. No need to add anything else. Don’t spend any time improving the design. It’s not necessary, and you will be wasting time.

Next week you get another request. Then another. And another. Each time it is a tiny little change – and you are explicitly told the application is not going to grow, and it does not need a redesign. Six months down the road your application is the main online hub of the company. It’s facing the web, it’s tracking just about every little bit of data your boss could think off, stores few gigs of data and every employee, client, applicant and visitor must interact with it at one point or another. It is a monster and the code is a labyrinthine maze of hacks, patches workarounds and hastily added modifications aiming at adding stability and security to something that was not designed for it in the first place. And better yet – no one, including you can believe how huge it got and how quickly it happened. No one has seen it coming. No one could predict it would actually be taken this far.

I’ve seen this happen multiple times, and heard about similar scenarios from others. Very small, simple projects have a tendency to blow up and grow exponentially into huge enterprise systems. You never know which project is going to bloat this way. In fact, you probably won’t know that one of your projects is on this destructive path until it’s to late. It happens in small incremental steps, spread over long amount of time. But it happens.

And since we know it happens, we can be prepared for it. The easiest way to avoid exponential growth of this type from becoming an issue is to always code as if you were designing something 5 times as big. Always modularize your code, build your applications in a classic 3 tier system and always use MVC or similar paradigm. If your app bloats into some sort of a monster, you will have the infrastructure in place to support it, and build it up. At least for the most part.

Then again, this approach flies in the face of the KISS principle. Sometimes coding this way is indeed an overkill. Sometimes a single form will just be a single form and building a 3 tier architecture and creating/deploying some sort of a framework to support it may be a huge waste of time and resources. Sometimes a quick, dirty and direct approach can be vastly superior to the roundabout enterprise way.

So I guess the trick here is to do something in between these two extremes:

  1. Always assume your code is going to be facing the real internet, even if the initial spec says it wont. Build with security in mind.
  2. Always assume you will need multiple access levels for your users and a flexible access controls.
  3. Never assume your data won’t bloat out of proportion. Never use sqlite or SQL Server express when you could be using something more scalable
  4. Design your database for extensibility – normalize your design and be aware that you might be adding more tables and more complex relationships into the mix in the future. This shouldn’t be much effort since your tiny project will probably need only a handful of tables.
  5. Try to do some data modeling and object-relational mapping as this will help scaling the code later. Since you are starting small, this should be easy to do and it will keep your code clean and organized
  6. Design or steal a robust log in / user authentication module. One day your application may become the main login to the intranet or myriad of tacked on services. Don’t half-ass it. Also think how you can handle authentication from robots – since you may need to set up complex web services one day.
  7. Use an off the shelf module or solution if possible. Why? Chances are the author already spent a considerable amount of effort to ensure extensibility and scalability of their code. So when that inevitable feature request comes in, you have that much less work to do. And even if it is not very extensible, you may still be ahead. It is much easier to justify the need to do some re-writing or re-designing when you can blame it on inferior 3rd party code.

Feel free to add your own tips to this list. Did this sort of thing ever happen to you? I speak from experience here – I still maintain a system that started like that. I was young, naive and green undergrad and I got a tinsy little web project. And now here I am, many, many moons later – still maintaining the damn thing. It grew into a behemoth – an overwhelming mountain of code some of which is so crappy that I am the only brave soul that dares to touch it.

I also inherited a system like that once. I rewrote most of the UI (that was actually my assignment) and left most of the back end intact – trying to avoid the hairy code on that side of the application. I then happily passed it on to the next brave soul.

How about you?

Software Calcuators and UI Design

Monday, February 2nd, 2009

I noticed a very peculiar UI design pattern lately. Have you ever used a calculator program? I’m sure you did at least once in your life. Most operating systems ship with some sort of a calculator tool. Strangely enough, they all look the same:

croppercapture8.Png

Can someone explain to me one thing though? Why the keypad? I mean, why is it there on every single fucking calculator – whether you are using calc.exe on windows or xcalc on a real operating – there is always the keypad. Why?

Your computer already has a perfectly well designed calculator input interface in the form of a numeric pad on the keyboard. If you haven’t noticed it was actually designed to look like a calculator with the tight clustering of numbers, and dedicated buttons the basic math operators placed in strategic easy to reach places:

croppercapture12.Png

Does anyone actually use the mouse input when using these software tools? Virtual keyboards are ok if you are paranoid and afraid of key loggers but they are not great for data input unless you are using a touch screen or a tablet with a stylus. I personally find the numpad faster, easier and more intuitive to use than an on-screen input.

This design tries to mimic a real physical, low end calculator forgetting that these things generally suck. If I know I will need to do some number crunching on paper, I use my trusty TI-86 – not one of those simplistic pieces of crap with a one line LCD display.

It seems that most of the simple calculator programs out there approach the problem wrong. They forget that the chepo calculators are built the way they are simply to cut costs. The LCD is so tiny to keep the production costs down. But there is no reason why a software calculator should not display recent calculation history the way high end graphing calculators do. Instead of using a big output screen for recent results, they instead build a huge and unnecessary number pad, and restrict the results to a tiny little box that can only hold one number at a time. WTF?

Believe it or not, this pathology runs deep. I was surprised that someone actually built a front-end for the unix bc tool using precisely this philosophy – a big virtual keypad and a one-number display box. There are however few tools designed the right way. Most notable one of course is the Windows XP PowerCalc:

croppercapture9.Png

I believe that I mentioned that to me this is one of the better calculator UI’s. It has a nice big box where it displays results, a graphing area, a list of constants and variables and helpful hints just above the input box. Did you notice the lack of the virtual numpad? Yeah, when you get rid of it, you suddenly get more space for all this other cool stuff.

SpeedCrunch is not bad either. The default setup does include the stupid keypad, but it is a linux app. That means you can easily configure it and make it into something very minimalistic:

croppercapture10.Png

I guess there is a lesson in UI design, and usability here. Designing a software calculator to work like calc.exe or xcalc is like building a word processor that acts the same way as a real typewriter. Imagine if MS word only allowed you to use mono spaced font, and would not let you delete or backspace erroneously entered characters forcing you to retype the whole page every time you made a mistake. Fortunately at some point someone figured out that you can actually design something much better than a virtual keyboard. We still haven’t really made that leap with calculators though. When you tell someone to build a calculator tool, they immediately start thinking about building that virtual key pad. Why? Because that’s what calculator is, right? A numeric keypad with an LCD.

What they should really be thinking about is what a calculator really does, and how can it’s functionality be improved and extended in virtual space.