Writing Your Own Language: Choose a VM or Native?

Andrew Binstock has a good post on “Writing your own language — How to choose a VM.” In the post, he says that “Most of these VMs encourage your compiler to output not bytecodes but source code using their native language.” But at the level of code generation, you’re talking low-level output, it’s much harder to take advantage of the abstractions of a high-level language. In other words, I don’t think that for code generation, it’s that much harder to output C than it is to output Ruby (or Lua or Java or C# or Haskell or whatever). And, if so, I have to wonder if targetting a VM (which all, as far as I can tell, are more constrained than the ‘portable assembly language’ of C) is not more trouble than it’s worth.

Of course, my argument only holds true for writing a new general purpose language: a domain-specific language that is primarily additive (for instance, a language that specifies the structure of a particular type of game cough cough) can, when targeting a VM, accept huge swaths of functionality, like type systems and control-flow structure and so forth.

And also, of course, whether you’re building a DSL or a general-purpose language, you will most likely end up using a parser generator. I personally like ANTLR, which like most generators allows you to alter the generation target between C and higher-level languages (right now, the significant upgrade to ANTLR 3 is being tracked by contributors writing code generators for Java, C++, C#, C, Objective C, Python, Ruby, LISP, Perl6, PHP, and Oberon).