Why Johnny Can’t Code

The most-excellent SF author David Brin has an article on Salon called “Why Johnny Can’t Code” (you have to sit through a commercial to gain access). In it, he laments the approachability of dear-old “line oriented” BASIC, by which, I think he means that era of BASIC when every line began with a numeric label:

10 PRINT “HELP, I AM STUCK IN A PROGRAM LOOP”

20 GOTO 10

Apparently, to this day elementary-school textbooks have sidebars exhorting kids to “Try it in BASIC!” and Brin tells an amusing story about having to buy a Commodore 64 on eBay for $25 in order to give his kid that chance.

He conflates a couple of different things: the increasing abstraction levels between the user and the underlying hardware, the resulting mental model, and the “learnability” of block-oriented imperative code. With BASIC, he says, you could write PONG and see the block move “at the command of math, and not magic.” A nice turn of phrase but not fair: the PDP-8 on which I learned to program didn’t have a raster display and I seem to recall quite a bit of PEEKing and POKEing “magic” when I did eventually gain access to such things.

There’s two things about BASIC:

  • Many people can “get it”, and
  • What they “get” is a decent mental model of imperative code (I’ll always remember the very first time I tried to tutor someone in BASIC: “So then it goes to the next line: ‘X = X + 1’,” I said. “False!” She said. I was flustered). Imperative code maps well to how computer registers and memory have volatile state. But BASIC doesn’t provide a mental model of all sorts of other stuff (like the plasticity of memory) that are equally part of any kind of even rough understanding of what’s “really going on.”

(A related point: the only other “programming-like thing” that many people “get” is spreadsheets.)

So, while I agree with the premise that it was much easier to develop a mental model of how computers worked, I disagree with the premise that BASIC was all there was to it.

If, though, you want a BASIC-like programming model in a modern wrapper, there’s a surprising equivalent: Flash. Flash works exactly like “line-oriented BASIC”: execution “falls through” from one block to the next but can be redirected to an arbitrary label/block/frame with a GOTO. It has the added benefit of having built-in graphics. You could totally write PONG in Flash. The only difficulty would be to constrain yourself to imperative constructs, since Flash has all sorts of additional constructs that you’d have to willfully ignore.

Review: Dragon Book, 2nd Edition

Compilers: Principles, Techniques, & Tools 2nd Ed. by Aho, Lam, Sethi, & Ullman is the perfect book for two niches: people writing C compilers for embedded processors and CS students running the gantlet of compiler courses. I don’t want to say that those niches don’t exist and that for people in that situation The Dragon Book, as it’s universally known, is anything but essential reading. For the rest of the world, that is, that broader-than-professional-compiler-writers-but-still-a-marked-subset-of-the-world-of-professional-programming niche, the book is not compelling.

The Dragon Book, as the text has been universally known, is widely considered a “hard read” and, like Shakespeare or Joyce, even a smart person reading it without guidance or prior exposure will have difficulty gauging the amount of attention one should give a particular point. It’s near-universal reach probably does make it “the one book to have if you’re having only one,” and it’s depth has kept it a spot on many a shelf for many a year, but in all honesty I found myself disappointed while reading it.

The entire second half of the book is devoted to optimization and OS- and machine-level code generation. It remains the definitive text on this subject, but this subject is outside the realm of interest for the large majority of people who might be interested in writing compilers. Today, most people interested in compiler construction are far more interested in language design issues than in chip-level instruction sets and graph-based optimizations. Whether a minority or a majority, I can’t say, but many (potential) compiler writers are interested in targeting the CLR and/or the JVM and thereby embrace a simplified runtime environment.

It seems sadly out of touch with current debates: the index lacks references to, for instance, “duck typing” or “packrat parsing,” two techniques that are unquestionably, on the one hand, “all the rage” and, on the other, theoretically intriguing. To not cover either, in a book that spends 200 pages on lexical and syntactical analysis, seems almost willfully pig-headed. The book covers its chosen principles in depth (bordering on “is this a book on compilers or automata theory?”) but does not venture outside those lines.

Another instance: on page 966, in the appendix, there’re two paragraphs discussing “object-oriented vs. phase-oriented” compiler design. To call that perfunctory is to be kind — perfunctory coverage would at least mention theVisitor design pattern.

Again: just in the past few days, there’s been talk about how one could / should generate method dispatch in a dynamic language. Joel Spolsky says that Ruby is slow because of method dispath. Avi Bryant says that’s true, but only because of naive code generation (Ruby today does not follow The Hugunin Directive : “Use fast paths for common cases, using native constructs if possible.”) Ted Leung refers to Avi’s discussion as “the 20-year-old method of in-line method caching.” The Dragon Book, in contrast, explains how to construct a stack frame.

Similarly, The Dragon Book spends a tremendous amount of time on “lexing” efficiently (lexical analysis is the first step of writing a compiler: transforming a stream of characters into a stream of tokens) and such is the influence of theory that I would be shocked to ever see a lexer not based on deterministic finite automata (DFAs). Anyone who ever wrote a lexer using, say, String.Split() and a hashtable would be banished from The He-Man’s Compiler Writers Club. This despite the fact that such a lexer would be highly transparent in purpose and design and would almost certainly be able to handle any reasonably-sized source code input (I was going to qualify this, but for heaven’s sake, we’ve got gigabytes and gigahertz today).

As for “tools,” the discussion is solely focused on lex & yacc. A quick example of how creaky the discussion is: the one source code snippet showing input to lex shows a function that returns a pointer and whose signature specifies it as returning an int. So…there’s a piece of C code that hasn’t been updated in twenty years. True, lex & yacc are the ancestors of virtually every compiler-construction tool. More’s the pity, as the general design of these tools could be made recognizable to those who’ve used tagging languages like PHP or ASP. For that matter, a generation that’s familiar with DOMs and XML parsers brings more than a blank slate to the tasks of tree-walking and rewriting.

So The Dragon Book sadly exemplifies the barriers-to-entry in the world of language design and compiler construction. This is a great book for the 500 teams worldwide building C compilers (yeah, seriously, there are hundreds of C compilers). For better-or-worse, it’s undoubtedly the textbook of choice for many CS curricula. It is not a good text for those interested in today’s active debates and discussions about language design and implementation.