220 Billion Lines of COBOL? BS

Update: The first time I read the post, my take was that Jeff Atwood took at face value the claim that COBOL is by far the most common programming language in the world. Subsequently, comments have pointed out he was skeptical. But I still read the post as ambivalent to the claim. (FWIW: I’ve known Jeff for the better part of a decade and he and I are both judges for the Jolt Awards. I’m hardly ‘hating on him.’) The “statistics” say that there are 220 billion (b for bill-yun) lines of COBOL in production out there.

Bull.

The COBOL vendors have been pumping that number up for two decades (at least). It was “30 billion lines of COBOL can’t be wrong,” when I was a magazine editor and, for all its verbosity, COBOL is not a language that is prone to cut-and-paste expansion of its codebase. (The only conceivable way that 200BLoC of COBOL have been written in the past two decades.)

Jeff “digs in” and finds a “big” COBOL application: “Read says Columbia Insurance’s policy management and claims processing software is 20 years old and has 1 million lines of COBOL code with some 3,000 modifications layered on over the years.” That’s supposed to be impressive? An insurance company (the classic mainframe industry) has a significant codebase in COBOL? Wow. Well, just 219,999 to go! (And by the way, the specifics of the codebase are curious: not a lot of COBOL codebases started in 1989.)

The great reality check on the prevalence of COBOL was January 1, 2000. A day utterly hyped (never mind the crazy end-of-the-world nuts, the “statistic” was that Y2K software disasters were going to cost more than half a billion dollars in catastrophic damages) and utterly uneventful (the “reality” was  … what was it? Some bus ticket vending machines  didn’t work).

Is there a lot of COBOL in the world? Sure, but not nearly as much as you probably think. Many legacy systems have been ported to (primarily) Java and run on modern hardware; it’s kind of shocking to encounter a “green screen” mainframe system running on blades, but such systems are probably every bit as common as COBOL on Big Iron.

You know what programming language is much, much more popular than visible?

C

12 thoughts on “220 Billion Lines of COBOL? BS

  1. Uh…

    “I have a hard time reconciling this data point with the fact that I have never, in my entire so-called “professional” programming career, met anyone who was actively writing COBOL code.” – says Jeff.

  2. IMHO, there is surely lots of COBOL code running around but 220 billion LOC is sheer exaggeration. And what you said about C sums it up all.

    /* I guess it would be utter stupidity even to estimate LOC for all the C code written till now. :) (But I guess this would make a good trick Q asked while recruiting (like in Joel’s Fog Creek). ;)) */

  3. “Jeff Atwood takes at face value the claim that COBOL is by far the most common programming language in the world.”

    Yeah.

    Oh wait, except for the part where most of the article is about his clear skepticism about that claim.

  4. Cool story bro, looking forward to the next episode of “Lets hate on Jeff Atwood for no reason”.

  5. Have you ever programmed in cobol? Each line of code cannot exceed 75 characters. Not to mention that you have to literally write out code like you’re writing a book. It takes roughly three to five .COB files to do anything useful in a visual application. I’d say about 7 files generated when working with a database view. I can see how there would be a billion lines of code because it takes freaking pages of code to do anything simple. Oh, yes we have two full-time cobol guys on our team. They pray one day to work in .Net with intelisense instead of NotePad or TextPad to write code, which i think at this point is copy & paste most of the time.

  6. @Rob: No, I’ve never actually programmed in COBOL. I’ve done extensive work in the travel industry though and am very familiar with the continuing role of green screen applications in production.

    According to Capers Jones, COBOL takes (IIRC) 90-100 lines of code to express a function point, making it about 3 times as verbose as Java. I’m not questioning that it’s verbose or whether there are hundreds of millions to billions of lines of COBOL code still in production.

    But there aren’t 200BLoC of COBOL in production. That flies in the face of both the industry’s own claims (back in the 90s) and common sense:

    * 45 years in service, absolutely peaking in use in the first two decades
    * A famously unwieldy language
    * Two decades of declining jobs minus the…
    * Y2K bump, which as mentioned, is a data point arguing AGAINST the “vast shadow inventory” hypothesis

  7. You stepped right onto one of my pet-peeves, so as calmly as I can, I’d like to explain why the whole Y2K bug turned out to be a non-event: because a lot of people — myself included — spent months poring over countless lines of, yes, COBOL (among others), and writing in year-windowing functions and the like.

    In 1999, I was working for a large, affluent public school district in the East Bay in California. We had a Unisys mainframe — and to the best of my knowledge, they’re still running most of their stuff on it — that was responsible for their accounting, their finance, their payroll, and *all* of their school operations.

    If it only weighed in at a million lines of COBOL, I would be surprised.

    And layered on top of that were, probably, tens of thousands of lines of WFL.

    We spent months auditing (and patching) all of the functions in that systems.

    When all was said and done, only a few unanticipated systems broke in the months following January 1, 2000.

    However, had we not done all that work, I can assure you that nobody would have received a paycheck in January.

    The people that always point to the end results of the Y2K bug as an example of unfounded hype bug me to no end. Have you even ever talked to someone that did Y2K work? I did, and it was not a picnic, and yes, a lot of things would have been broken if we hadn’t done it. (Hell, there was even a critical patch for Sun DNS servers that had to be applied; I know this, because I applied it, and fought it with as it didn’t work out quite right.)

    As for 200 billion SLOC for COBOL … yes, that seems a bit high, but I don’t honestly think it’s off by an order of magnitude. We were a little tiny system by lots of standards, and we were running easily a million lines of COBOL, and a huge chunk of it was custom code and in-house patches.

  8. @Rob Sheldon: Did I ever speak to people doing Y2K work? Yes, regularly, starting in 1996. Which was when I first heard the estimate of (I believe) ~$300M in catastrophic costs, which escalated to >$600M by 99 (again, not to mention the wing-nut estimates).

    Of course there were Y2K bugs that were corrected. But, let’s take a look at your own anecdote: you and others spent months auditing a codebase on the order of 1MLoC. Let’s say that it took 1 person-year of effort to audit 1MLoC. That’s a very impressive number. And I’m sure that you have a good perspective on the natural question it raises:

    How many other teams were doing similar work? What’s the likely order of magnitude? 2,000 such effort-years (2BLoC)? 20,000(20BLoC)? 200,000 (200BLoC — the amount claimed in the statistic)?

    I submit that the totality of evidence: the relative lack of catastrophe, the relative lack of overwhelming salary bumps, the relative lack of this overwhelming success to re-establish COBOL as a preeminently powerful language — argues for the lower end of the spectrum.

  9. “Many legacy systems have been ported to (primarily) Java and run on modern hardware”

    “You know what programming language is much, much more popular than visible? C”

    Without real figures to back it up, your assertions are every bit as much of a wild guess as anyone else’s that there are “X” number of lines of COBOL running out in the world.

  10. @Bea Ess: Did I quote or fabricate some statistic about “X billion lines of C” in the world in order to support my point that “220 Billion” is specious?

    It’s not a “wild guess” that C is used in great amounts, JUST AS it wouldn’t be a wild guess to say COBOL is. The whole POINT of the post is that there’s a difference between making a qualitative statement and a bogus quantitative one.

  11. Larry:

    We didn’t review every single line of code in there — there was just me and one other guy, and I was a young kid at the time with the attention span of a distracted gnat. So, I shouldn’t have said “every” function — that was an exaggeration; I should have said, “every function that handled dates”. IIRC, we started by looking at the various database schemas, and in parallel starting from the top of the most common day-to-day programs that were run, and then working our way on down. That’s part of why there were still some functions that caught us by surprise and broke.

    For my single data point though, I can relate that we were both salaried, and that there was no such thing as overtime. The other guy spent some late nights screaming and — maybe literally, once or twice — tearing his hair out. We both put in a ton of extra hours over the course of most of a year. There was no salary bump for that.

    I think you’re hand-picking the evidence that supports your case, which is even more unfounded than the statistic you’re railing against.

    Why in the heck you’d think that the Y2K problem would make COBOL a _more_ attractive language for software developers is beyond me. I would not have expected this to “re-establish COBOL as a preeminently powerful language”; that’s an incredibly silly statement on all of its points. I would not argue that it was re-established, or that it was “preeminent”, or “powerful”.

    My experience was simply that it was there, and it continued to grow in size just because that took less effort — even factoring in the costs of maintenance and Y2K — than magically moving the whole mess over to something else. Which, by the way, is a process that this particular school district started to attempt shortly before I left for another state; they started trying to migrate to SASI, a more modern system intended to replace our Unisys mainframe. They shelled out some serious money for the system, and some additional serious money for training, and then continued to shell out some serious money as efforts to move operations over to the new system failed miserably. For the money they paid out, we could’ve overhauled a bunch of the most commonly used programs on our Unisys system — which, by the way, had the additional handy feature of having the very best recovery systems I have ever, to date, seen.

    That’s why COBOL is still in use, and is still getting lots of development.

    A very lazy search for COBOL on dice.com returns 531 results, most of them for the U.S.; that’s not too shabby considering our country’s current unemployment rates.

    Who knows how much COBOL is still in active use in other countries?

    Like I said: I agree that it’s probably an exaggerated figure, but I doubt that it’s off by an order of magnitude. I think if you’re gonna call “bullshit” on it, you’re gonna have to find some actual supporting evidence, not a bunch of guesses based around what did and didn’t happen on Y2K.

  12. For the y2k problem , we had to measure the number of LOC in COBOL for a telephone company and it was 2 billion+
    There are at least 100 companies of that size, like banks, insurance companies (all companies part of the bailout by the way).
    Of course, 1000 of COBOL is equivalent to 140 lines of C/C++.
    A call to DB2 request was many lines of code.

Comments are closed.