Archive for the ‘Languages’ Category.

Clever

image

(follow image link to Zazzle store…)

ToyScript and the DLR : 3 Different Compilers

While I talked about being blown away by certain talks at Lang.NET, from a pragmatic standpoint I very much enjoyed the practical talks, such as those given by Harry Pierson and, especially, Martin Maly. Martin is one of the IronPython / DLR developers and hand-wrote a compiler for a language called “ToyScript.” This compiler is now part of the IronPython distribution. Harry wrote an F# PEG parser (is that redundant?) of ToyScript and I wrote an ANTLR-based parser. The hope is to show 3 different approaches to building the compiler front-end, but all using the same backend (“Hand off the AST to the DLR”).

Now, on the way to that, I started yesterday writing a series of examples that do things exactly backward: start with the handoff to the DLR (“GenerateNopFunction()“), add nodes (Using XML to represent the AST), and then say “Oh, and you create that AST using compiler front-end techniques.” The sad thing is that all of this will have to be in my copious spare time, since my language stuff isn’t supported by paying articles (I did the Lang.NET conference on my own dime).

600 Lines of Code

Like Charles Petzold, my first reaction to Jeff Atwood’s question “What Can You Build in 600 Lines of Code?” was along the lines of “5 articles!”

But actually, I think 600 lines is just about the right benchmark size for a language, because it’s:

  • Small enough to develop in a weekend
  • Large enough so that “finger typing” is neither dominant nor drowned-out
  • Large enough to exploit a language’s particular idioms and strengths

A caveat though: the use of libraries and frameworks can grossly distort this discussion. Frankly, the quote “commercial project written in less than 600 lines of Ruby code” (ibid.) is wrong: it ought to be “of Rails code.” It’s akin to saying “In DOS batch I can create a spreadsheet in a line of code — all I have to do is type ‘excel’!” (I know it’s not exactly the same, but there’s a similarity.)

This is one of the reasons why writing a parser has always been a measure of a programming language — it involves complex pattern matching, the creation of a complex datastructure, transforms of that structure, and a fair amount of IO.

Harry Pierson’s F# PEG parsers (is that redundant?) are a good example: I don’t doubt he’ll complete a parsing front-end to the “ToyScript” language in less than 600 lines of code. The first night at Lang.NET, I wrote an ANTLR parser for ToyScript (# lines in ANTLR, expands to # lines of C#!). From the impression I got of Newspeak, I think it would take significantly less than F#.

Concurrency Not Emphasized at Lang:NET

Although concurrency was laid out by Jason Zander as one of the overarching themes of language work moving forward, it was not at all emphasized as a primary concern in any of the talks I saw. There was some talk about language features that a “sufficiently smart compiler” could handle but no one took off their shoe and banged the table and said “We must focus on this!”

I button-holed Erik Meijer and Brian Goetz on the topic: Erik is a great believer in Software Transactional Memory and Brian is “cautiously optimistic” about it. Both were very quick to acknowledge that we don’t really know how people will react to the complexities that emerge when the behavior of memory transactions starts to necessarily diverge from the familiar world of database transactions. When I mentioned the Actor model as being intuitive, Brian astutely diagnosed me as having a Smalltalk background and said that message-passing would be confusing to a population that viewed objects as — I think his phrase was — “glorified structs.”

Off to Lang.NET

>

imageimageYesterday I went swimming on a coral reef in 100′ visibility with whales singing so loud that I almost expected to see them underwater. For the next 3 days, I’ll be in Seattle, where I assume they have this thing I hear about called “heating.”

Actually, after canceling the trip because I had such a non-productive, unlucky, and generally lousy January I decided at the last minute that the best thing in the world for me was to be in a room filled with people who totally out-class me.

My plan had been to spend, like, 6 weeks developing something and then saying “Well, I threw this together on the flight over…” and then have people say “well, it sucks of course, but for a plane flight, it’s not bad.” Now I’ll just sit quietly in the corner. I’ll be the guy wearing three layers of sweater over an aloha shirt….

"Real World Haskell" Book In Public Beta

Haskell is a language that is pretty hard to “just pick up” (especially if you are mostly familiar with mainstream, C-derived languages). Perhaps “Real World Haskell” by Bryan O’Sullivan, Don Stewart, and John Goerzen will help the language (much beloved in academia) increase in popularity.

Harry Pierson’s Awesome "Practical Parsing in F#" Series of Posts

When I can shake some time free to actually learn F#, this awesome series of blog posts on “Practical Parsing in F#” is definitely something I’ll revisit. Parsing is one of the better tasks for shaking free a large number of concepts about a programming language, since it invariably involves large and dynamic data structures, abstraction strategies, IO, etc.

Tagging Languages And Monolithic Code

The biggest problem with tag-based languages (<h1><% someCode %></h1> : the ASPs, the ColdFusions, the PHPs…) is that they facilitate monolithic code. This is related the big criticism of XML and DOMs for data structures, too: they facilitate the creation of hierarchies, not graphs. (As always with programming, the issue is “facilitates” not “possible”…)

My dear friend “Bob” creates horrific pages that are hundreds and even thousands of lines long, with <cfif> at line 100, and then a <cfelse> at line 837 and then a … and ColdFusion isn’t valid XML and there’s a combination of HTML indentation and ColdFusion code indenting.

Just absolutely impenetrable stuff, and while I’m more than willing to blame many problems on Bob, I think this is a problem that the tool he uses (ColdFusion) is facilitating.

Lang.NET Symposium: Jan 28-30 Redmond

Sounds like a great opportunity to hang out with compiler geeks. Since someone’s already beaten me to an LOLCode compiler for the DLR, I’ll have to put in some work on my other projects: Excel# and a more serious language I’ve been noodling around with called Rinq, a REST-Oriented Language that supports LINQ.

Shiny Buckshot Rather Than Silver Bullets

Wes Moise’s musings on Supercompilation led me to this discussion of the the myth of the sufficiently smart compiler.

The “sufficiently smart compiler” is still trotted out regularly, even though the market has moved away from demanding even moderate attention to performance at the compiler level. Have you timed your rectangular arrays in C# lately? Or, to be inclusive, have you looked at what’s (not) hoisted out of loops in Java?

The existence of the Iron* languages from Microsoft stems from Jim Hugunin’s discovery that adding moderate smarts allowed dynamic languages to run fast on the CLR:

  1. Use native CLR constructs whenever possible. These constructs are all heavily optimized by the underlying runtime engine. Sometimes these constructs will have to be used creatively to properly match Python’s dynamic semantics.
  2. Use code generation to produce fast-paths for common cases. This can either be development-time code generation with Python scripts creating C# code or run-time code generation using System.Reflection.Emit to produce IL on the fly.
  3. Include fall-backs to dynamic implementations for less common cases or for cases where Python’s semantics requires a fully dynamic implementation.
  4. Finally, test the performance of every CLR construct used and find alternatives when performance is unacceptable. Type.InvokeMember is a great example a a function that must never be used in code that cares about performance.

That’s hardly the stuff of PhD theses (don’t misunderstand me: Hugunin’s paper, which actually said something important, is more valuable than 99% of CS theses).

The point, though, is that we are in a time of high tension between what is possible and what is practiced. This gives me hope that we might see true breakthroughs in programming languages. Fred Brooks spoke of a silver bullet defined as a “single development, in either technology or in management technique, that by itself promises even one order-of-magnitude improvement in productivity, in reliability, in simplicity.” [my emphasis]). I don’t believe in silver bullets, but I think there’s a possibility of shiny buckshot.

On the discouraging side, I think there are great difficulties to building such a system: the development of a shiny shotgun is, I think, the work of double-digit person-years. It’s work that’s too far over the horizon for VC funding, too pragmatic for grants, and too dependent on brilliant execution by a small, high-performance team for Open Source.