On Optimization

Posted on July 21, 2009 by Luke Maciak tein.co/3448

Here is an interesting story that I got from one of the old-timers in our industry. The guy who told it to me used to be a COBOL developer back in the day when Cobol was the “bleeding edge” technology. He is no longer working in the field nowadays, and he sort of lost track of the technology train.

He told me that he recently was working on some deal with the first company that hired him out of college. They gave him a brief tour and talked about the upcoming upgrade of their billing/accounting/everything else system. Apparently they were finally moving from their old cryptic, COBOL application to a brand new one written in ASP.NET. Few more prodding questions confirmed his suspicion. The COBOL app was the same exact system he helped to design 20 something years ago. This was some of the worst, buggiest and most unreadable code he has ever wrote in his life (being green and fresh out of school) and yet it was still in operation.

That’s not all though. He asked them how come they have never replaced the system with something more modern up until now. It turns out that they had. This was actually the third attempt at migration to a new technology. Previous 2 have failed miserably. Their development teams did produce viable code which fared pretty well in small scale tests. But when they actually tried to run full scale operations, the ASP app would just grind to a halt.

I chuckled. I was not surprised. “Back in the day, people knew how to write code. We are so spoiled by the Moores Law that we forgot how to do it these days”, I mused. He nodded in agreement.

You see, the COBOL system processed millions of records every day. Even though it was old, and running on an ancient hardware each batch would only take seconds to crank out. It was stable, reliable and the COBOL old-timers optimized the shit out of it over the last 20 years using every known trick in the book. The app was maintained by a shrinking cadre of wise bearded fellows who scoffed at new fangled concepts such as objects, polymorphism or encapsulation. They, however knew exactly how to shave few seconds off an operation by writing intricate spaghetti code.

The ASP and then ASP.NET code on the other hand was written by groups of greenhorns fresh out of college. They were idealistic, and excited about their project. They wrote great object oriented code, split into clearly defined modules. They leveraged open source libraries. More importantly their code was run on top of the line servers – best the money could buy.

And yet, each time they put it to real life stress test the ancient COBOL kludge would run laps around them. It would process 10 thousand records before their code even finished initializing. What took the old app 10 minutes would take 2 days on the .NET platform despite running on a hardware that was at least 10 times as fast.

They could not take such a major hit in speed as it would hurt their productivity. So twice they have shelved the project waiting until hardware finally catches up. Yep, they were waiting for hardware to catch up so that they could even hope to match the performance of a 20 year old application. Every once in a while they would brush of the code install it on new, juiced up servers and have another crack at it. They didn’t bother rewriting it, because so much money was sunk into it in the first place. So each consecutive team assigned to it would just do some minor re-factoring. This time however they were sure of success. The initial tests revealed that the newest incarnation of the ASP app was only 30% slower than the COBOL app which was considered an overwhelming success.

Non only that, they explained, but in 2-3 years the hardware will become twice as fast, which means that they might actually be able to match or even exceed the COBOL performance. Imagine that.

I’m not taking a crack at .NET or modern programming paradigms here. There is nothing wrong with either. Someone could argue that choosing to write this code in C would be a better idea from optimization point of view. Then again, modern JIT compilers can often optimize the executed code much better at run time than a C guru could ever do it by hand.

In fact, there is no for why COBOL code running on old hardware would really outperform .NET running on a modern rig. None besides crappy coding on the .NET platform. Back in the day when memory and disk space was scarce and each CPU cycle was important people knew how to optimize code. They knew how to write programs that will scale well, under limited resources. They had to learn these tricks because there was simply no way to shove hundreds of megabytes of data items in and out of memory like it is today. When they wrote code, they had to think about how it will work with large data sets.

Over the years we have sort of become lazy and complacent. I’m as guilty of it as everyone else. When I write code I hardly ever consider large data sets. I just make sure the important columns in the database are indexed, and that my query is not retarded. I hardly ever look at the actual logic within my program. I write deeply nested loops without thinking about scalability. It became sort of a pathology.

I became painfully aware of this while working on my thesis. When I was forced to do operations on big data matrices over and over again, I had to go down to basics – get rid of fancy objects, iterators and all that jazz, and just use simple loops and arrays. And even then I was struggling because no one ever taught us how to really approach practical optimization. I mean we talked about it in theory – and we were taught about algorithms. But no one really bothered to teach us practical things such as good ways to identify bottlenecks in your code, or practical optimization tricks.

I guess everyone assumed you will pick up stuff like that on your own. Or it will get drummed into you at your first job. Or people simply forget about it. After all, parallelism is the sexy thing to talk about these days. So instead of finding bottlenecks and eliminating them let’s parallelize the code and make it run on a cluster. Which is a valid approach, but not in every situation. Every once in a while you run into situation like the one I described above. There is an ancient COBOL app running on an ancient hardware – and it cranks out results faster than your code written in a modern language, running on a modern computer.

Does it means that we lost an edge? Does it mean we forgot how to write efficient code that will run fast even with very limited resources. Not really. There are still people out there who can do this sort of thing well. And of course over optimization can be harmful too.

This is just something to think about. Situations like this one happen in real life, and are quite ironic. I wonder how did the .NET development team justify their poor performance to the management.

This entry was posted in Uncategorized. Bookmark the permalink.

11 Responses to On Optimization

Steve says:

July 21, 2009 at 11:54 am

My friend works on a contract with the federal gov’t here in Canada. He works on the MAINFRAME app that records and prints out all the government cheques (for Unemployment Insurance, pensions, salaries, etc). It is still in COBOL and I don’t think they have any desire or need to move to some other kind of system…especially one that has so many possible security holes.

Reply | Quote
Naum says:

July 21, 2009 at 1:00 pm

Oh, on COBOL and legacy mainframe systems, I have some stories… …don’t have time to dive into ATM, but I could spin many a yarn resembling, but even more comical, in the theme of your post…

Reply | Quote
stan geller says:

July 21, 2009 at 2:09 pm

here in Canada as well…
We run Cobol platform on a $500 Celeron home build 5 y/o Linux Madrake!!!
machine that serves 10 branches all across US and Canada.
Switching to sap/.NET/whatever would easily cost us $3-4 millions.
I would rather add some COBOL – Python/php/MySQL wrappers for our web services such as stock checks and so on and spend another $20 or so on a memory upgrade….cheers

Reply | Quote
Zel says:

July 21, 2009 at 2:48 pm

Some large banks also still use COBOL, probably for the same reasons you mention. I’ve heard quite a few stories about attempts of migration, none were successful that I know of.

There are some tools to detect bottlenecks in .NET though. Back when I was toying with C# and XNA, I used Microsoft’s CLR profiler to detect major memory users and lengthy functions, and it worked pretty well.

Still, it’s strange that current hardware can’t match a 20 years old system. The op/s count has been multiplied about a thousandfold or more, and you could probably fit the whole HD in RAM disks, so even a poorly written code using a similar algorithm (I don’t know COBOL, can it do things others -like C- can’t? ) should be faster.

Reply | Quote
Luke Maciak says:

July 21, 2009 at 4:26 pm

@ Steve: Yeah, back in the times of COBOL security was likely not a primary concern. :)

@ Naum: Hey, you can’t say you have many stories and then not share even a single one! That’s Tell the stories!

@ stan geller: 4 mil – ouch! I imagine most of that is hardware upgrade (and if you want .NET windows licenses). Sometimes it just doesn’t make sense to upgrade while the old system is working.

Zel wrote:

Still, it’s strange that current hardware can’t match a 20 years old system. The op/s count has been multiplied about a thousandfold or more, and you could probably fit the whole HD in RAM disks, so even a poorly written code using a similar algorithm (I don’t know COBOL, can it do things others -like C- can’t? ) should be faster.

I suspect this was due to Daily-WTF quality code the ASP.NET team produced. I don’t really know the exact details, but that’s what it sounded like. There is really no reason why the new code on new hardware shouldn’t be faster – unless of course you are using bubble sort of everything, or your whole database is a single table with no keys, constraints of indexes. Or if you use MSAccess as your db backend. :P

Reply | Quote
astine says:

July 21, 2009 at 6:32 pm

Luke Maciak wrote:

There is really no reason why the new code on new hardware shouldn’t be faster …

Or if you had absolutely no understanding of the problem with which you were dealing. Or, more importantly, if you tried to overgeneralize the code. It sounds to my like a combination of two things happened. One, the greenhorns either didn’t or couldn’t read the code of the older application and had no idea how it worked or supposed did its job. And, two, being straight out of college, probably over engineered their new application ridiculously.

The old Cobol program was likely tailored very exactly to the problem at hand, taking into account harware and business rules into its optimizations. The Asp.NET program on the other hand was probably engineered as meta-solution, business logic and hardware cleanly abstracted out of the core for the sake of flexibility. Too many layers can cause problems and it’s likely that a lot of the overhead of the .NET app had to do with dispatching on objects that shouldn’t be objects and parsing XML that shouldn’t be XML. (I once worked on a project that was slow for a very similar reason.)

Leason being: make sure your code is modular and flexible so that it can adapt, but don’t go too far or or will turn into a different kind of spagetti code. If your app can be described as a ‘framework,’ of any kind, it is too general.

Reply | Quote
stan geller says:

July 21, 2009 at 7:07 pm

Luke Maciak wrote:

4 mil – ouch! I imagine most of that is hardware upgrade (and if you want .NET windows licenses). Sometimes it just doesn’t make sense to upgrade while the old system is working.

we don’t have anything .NET or MS server related here at all… Linux servers …that is why our rack is worth about only $5k in hardware..the only upgrades/updates would be COBOL – web/MYSQL connections and services

Reply | Quote
Luke Maciak says:

July 21, 2009 at 8:07 pm

@ astine: Yep, if you are writing a framework you are likely “doin it rong”… Unless of course you actually set out to write a framework.

“Let’s build a framework” seems to be the descendant of the much older “let’s create a domain specific language” problem solving strategy.

@ stan geller: In any case, .NET is probably the last thing you would want to migrate to being a linux shop and all. You’d probably want Java instead – but Java would probably keel over and die on your hardware and software stack. So what you are doing right now is probably the most cost efficient thing that can be done.

Do you guys have any plans to replace COBOL core with something else eventually? Cause, let’s face it – the number of people who are actually good COBOL hackers (is there such a thing?) is probably slowly approaching zero. I mean, you could probably train new guys on the job but you know how that is. I’m curious if your company is at all worried about stuff like that.

I know that many are not, and I’m sure we will see COBOL code in production environments for the next 20-30 years if not more.

Reply | Quote
Ivan Voras says:

July 24, 2009 at 5:12 pm

I have also heard stories about mainframes and COBOL from old graybeards, to the tune of millions of transactions processed by some ancient code in some $timeframe, but surprisingly, I’ve never ever heard anyone analyzing why is it so, how is it possible and why can’t newer systems do that.

Let’s consider databases – for example a database of utility bills to be sent out. In a database you go through these records, calculate the amount to be charged and mark records as “sent”. For consistency sake (ACID rules) you need to update the database synchronously to avoid duplicate processing, for resilience to power failures, etc.

For example, a 10-drive modern disk array can be relied upon to perform about 2000 random IO transactions / s (this is actually a grossly oversimplified number as there’s actually a lot of specifics here I’m leaving out for the sake of brevity – like RAID type, etc. – Google will enlighten curious readers). Think about it – 2000 transactions per second. That’s modern mechanical drives (SSDs are just coming to the data centers). What those mainframes had to work with was certainly not near this kind of performance. The *only* ways you can increase this performance, and ones that are routinely used, are either by making everything sequential or by various caching methods.

If those old applications are to approach the performance it’s claimed of them, they either need to do processing on the drive sequentially or they need to rely on caching. But any kind of memory caching is dangerous due to power loss, etc. risks. On the other hand, those mainframes are often built in underground bunkers with their own diesel generators so the power doesn’t ever need to go down.

Hard coding everything could make it possible to do most of the work sequentially (and thus gain those 100+ MB/s raw data IO numbers everyone’s familiar with in benchmarks) in a way that’s simply not possible with SQL (but it could be “emulated” in SQL – like not doing “select *” but grouping data by some criteria – street name for example: “select * … where street=x” and processing it thusly – on error, entire street batch is redone).

I suspect it’s a combination: what the modern programmers forgot is how to not rely on SQL but simply scrape the data off the drive platters and process them as it comes, and on the other hand, management refuses to build monumental system rooms to house data centers that would be resilient to bomb strikes (thus making caching available) ever again.

Reply | Quote
Animesh Sarkar says:

July 29, 2009 at 7:16 am

Believe it or not I am a COBOL programmer, programming on a mainframe for a major financial institution. Just to clear up a few misconceptions here are a few facts
1. The latest mainframe OS by IBM (Z/OS) is actually the same age as WinXP (released in 2000) and it’s latest stable release is younger than Vista ( released in 2008), if wikipedia is to be believed http://en.wikipedia.org/wiki/Z/OS
2. IBM (and most other manufacturers) still manufacture mainframes and hardware upgrades for existing ones.

So just because it has been around for a long long time it doesn’t mean that it is old.

3. COBOL is extremely and I repeat extremely good at processing fixed length records in fixed formats. Give it data in a predefined format and it is parsed as it is read –automatically no extra coding required.
4. Almost all COBOL program will have to access a database of some kind and DB2 uses SQL.

In general the data that needs to be processed has largely repetitive portions which will be most likely be used to query the db

So the only optimization that a COBOL programmer has to do is make sure that a. the data being fed to the program is sorted on the most likely criteria for db query ( done in JCL just before the program itself) and
b. query the db once every time the values have changed.

I learnt the second lesson the hard way when I wrote a program which processed around eight million records in ~200 minutes and by just adding an ‘if ‘ statement immediately before the db query reduced the execution time to 3 minutes.
Yes COBOL is fast! Blindingly so!

On the other hand making COBOL do much of something else for example parsing a variable length web address and you would be looking at more than 1000 (yes thats three zeros) of coding.

Then again there are other tools better suited for the job in mainframes too :)

Reply | Quote
Luke Maciak says:

July 29, 2009 at 10:03 am

@ Animesh Sarkar: Hey, thanks for the perspective. Most of us automatically assume that COBOL runs on a dusty old, steam powered behemoths that were built in the bronze age, require souls of orphans to boot up and can only be found in the deepest, darkest corner of the company basement in a room with a “BEWARE OF THE LEOPARD” on the door. But, as we can see, it can also be found on a server rack in a modern data center. :)

Reply | Quote