Shiny Buckshot Rather Than Silver Bullets

Wes Moise’s musings on Supercompilation led me to this discussion of the the myth of the sufficiently smart compiler.

The “sufficiently smart compiler” is still trotted out regularly, even though the market has moved away from demanding even moderate attention to performance at the compiler level. Have you timed your rectangular arrays in C# lately? Or, to be inclusive, have you looked at what’s (not) hoisted out of loops in Java?

The existence of the Iron* languages from Microsoft stems from Jim Hugunin’s discovery that adding moderate smarts allowed dynamic languages to run fast on the CLR:

  1. Use native CLR constructs whenever possible. These constructs are all heavily optimized by the underlying runtime engine. Sometimes these constructs will have to be used creatively to properly match Python’s dynamic semantics.
  2. Use code generation to produce fast-paths for common cases. This can either be development-time code generation with Python scripts creating C# code or run-time code generation using System.Reflection.Emit to produce IL on the fly.
  3. Include fall-backs to dynamic implementations for less common cases or for cases where Python’s semantics requires a fully dynamic implementation.
  4. Finally, test the performance of every CLR construct used and find alternatives when performance is unacceptable. Type.InvokeMember is a great example a a function that must never be used in code that cares about performance.

That’s hardly the stuff of PhD theses (don’t misunderstand me: Hugunin’s paper, which actually said something important, is more valuable than 99% of CS theses).

The point, though, is that we are in a time of high tension between what is possible and what is practiced. This gives me hope that we might see true breakthroughs in programming languages. Fred Brooks spoke of a silver bullet defined as a “single development, in either technology or in management technique, that by itself promises even one order-of-magnitude improvement in productivity, in reliability, in simplicity.” [my emphasis]). I don’t believe in silver bullets, but I think there’s a possibility of shiny buckshot.

On the discouraging side, I think there are great difficulties to building such a system: the development of a shiny shotgun is, I think, the work of double-digit person-years. It’s work that’s too far over the horizon for VC funding, too pragmatic for grants, and too dependent on brilliant execution by a small, high-performance team for Open Source.

Bash Ups

With the release to public beta of Popfly, Microsoft’s mashup editor, I’ll reiterate my theory that mashups are the UNIX shell of the Internet. The corollary is that we need a suite of command equivalents:

Command Mashup Alternative
cd, mkdir, rmdir facilities for manipulating “current URI”; REST principles, etc.
mailx messaging transformations and transports: mail, IM, SMS, twitter, etc.
man  ?
jobs, ps, kill, sleep, etc. facilities for multiple mashup control
ls spidering facilities / robust HTML parsing, etc. “Get-ChildItem” in all its polymorphic complexity.
who FOAF
finger, chfn blogging
cat, sed, sort, grep, wc, tail, etc. All sorts of facilities for transformation of source to sink

Right now, everyone’s concentrating on what output the mashup editors can produce or what the component manipulation looks like. I think the winner of the mashup evolution will be the one that provides the most flexible suite of components.

Supply & demand? Outsourcing backlash? CS grads average $53K out of school

[C]omputer-science grads saw their average starting salary offers grow by 4.5 percent last year alone. The new average salary for a job right out of college is now $53,051. That’s the highest amount this decade.

Starting salaries surge for computer science grads [Ars Technica]

Interesting. I still think that the future is mixed-at-best for United States programmers (with our relatively high cost), but at least for now there’s some good news.

LINQ + Reflection: Querying the Object Graph

Yuriy Solodkyy demonstrates the combination of LINQ and Reflection APIs, a technique which could prove to be tremendously powerful and which strikes me as allowing LINQ-enabled languages to have a level of “dynamism” that puts to shame duck-typing.

Could this simply replace the Visitor pattern with an approach that needs no cooperation from the data structure?

Would this allow an Abstract Factory that allowed you to dynamically find all products of one Factory and replace them with those of another?

 [via Steve Pietrek]

C++0x to Incorporate Standard Threading Model

The working groups of the C++0x committee are working hard to complete a major new standard for C++ (there’s a big meeting here in Kona in October). If you’re not intimate with C++, you may be surprised that such an important language has not had a standard threading model and that such a model is a major part of the C++0x version. This is actually part-and-parcel of the design philosophy that made C and C++ so important: the number of libraries dictated by the standard for C and C++ is much smaller than the BCL or Java’s SE libraries. This allows standard C and C++ to be available for hundreds of processors.

I recently read the public C++0x papers on threading (links below). The proposed threading model is non-radical and is based on Boost.Thread. The reasonable perspective is that this is a conservative decision thoroughly in keeping with C/C++’s long tradition of minimal hardware/OS assumptions.

The emotional perspective is that they’ve let slip by a golden opportunity to incorporate the best thinking about memory models. “Multi-threading and locking” is, I would argue, demonstrably broken for business programming. It just doesn’t work in a world of systems built from a combination of libraries and user-code; while you can create large applications based on this model, large multithreaded applications based on locking require not just care, but sophistication, at every level of coding. By standardizing an established systems-level model, C++0x foregoes an opportunity for leadership, albeit radical.

One of the real thought leaders when it comes to more sophisticated concurrency semantics is Herb Sutter. His Concur model (here’s a talk on Concur from PDC ’05) is, I think, a substantial step forward and I’ve really hoped to see it influence language design. Is Sutter, though, just an academic with flighty thoughts and little understanding of the difficulties of implementation? It seems unlikely, given that he’s the Chair of the ISO C++ standards committee. So you can see why there might have been an opportunity.

Multithreading proposals for C++0x:

Data Volumes Trumping Core Multiplication? Interesting Thought

Bill de h?ra makes an intriguing pitch that programming will be impacted by increasing data volumes more than by the transition to multi-/many-core. His basis is anecdotal — we don’t have the same metaphysical certainty that all of us will be dealing with much-larger datasets as we have the certainty that we will all be dealing with multiple and then many cores — but is logical. The speed of a single stream of in-cache instructions is blazing: short of chaotic functions, it’s hard to imagine perceptibly-slow scenarios that don’t involve large amounts of data.

What I find especially thought-provoking about this argument is that it stands in opposition to another post I was going to make about YAGNI infrastructure. Not long ago, Alan Zeichick ranked databases and Ian Griffiths questioned whether he took price-performance into account. Even allowing that there are costs for OSS (training, tools, administration, etc.), I’ve noticed that few real-world CEOs understand where their companies stand in relationship to scaling. In my experience, they often over-buy software- and hardware- capacity and under-buy contingency capacity.

It seems to me that nowadays we work more and more with data streams and not data sets. On a transaction-to-transaction basis, I think it’s an uncommon application that uses more data than can fit into several gigabytes of RAM (obvious exception: multimedia data). Never mind multi-node Map-Reduce; I’m saying that it seems to me that many “real” business systems could have a single-node non-relational data access layer.

It seems that what I’m saying is in direct contrast to what de h?ra is describing, and yet points to the same “maybe we ought not to start from the assumption of a relational DB” heresy. No conclusion… food for thought …

Reflection: I think I let my attention wander — the world de h?ra is describing is that high-performance computing and I wandered into general business-computing. The two intersect, of course, but are not generally the same. So the thought then becomes that powerful relational databases are being squeezed from both the low-end (“eh, just put in memory”) and the high-end (“ok, so this is our distributed tuple-space…”).

Moving Beyond The Typing Debate?

Maybe the readers of my blog are more astute (and better looking!) than average, but I was happy that several comments to my recent post on type inference were properly dismissive of what one called “the static vs. dynamic holy war.” As I said when writing about the myth of better programming languages last year, different programming languages engage your mind in different ways and that is what is worth pursuing. There was a time when I was programming professionally in two languages: C and Prolog. They engaged my mind in such profoundly different ways that shifting between them felt like the clutch on the ’77 Ford van I was driving at the time (three-on-the-tree, baby), but in terms of problem solving, I felt like Superman.

Now, first-class functions have entered the mainstream (primarily via C#) and that, in combination with an influential paper about Google’s MapReduce programming model has led people to begin to see what functional programming advocates have been talking about lo these many years.

Similarly, people are beginning to realize that concurrency models just might be important in the coming years and are beginning to pay attention to languages like Erlang. (Incidentally, O’Reilly & Associates seems to be betting that “shared nothing” is the way to go, a conclusion that I think is certainly too sweeping and premature. ORA is the most influential publishing house in software development right now, so the biases of their editors in this area will have a noticeable impact on the debate in the years to come.)

Update: No sooner had I written this post when I see in my Inbox that Pragmatic Bookshelf has published Programming Erlang. Look for a review in the coming weeks…

IBM’s Telelogic Acquisition: Buying Marketshare, Not Expanding Market

I agree with Alan Zeichick’s analysis of IBM’s acquisition of modeling tool vendor Telelogic:  the overlap with IBM’s Rational product line is high, the acquisition “is a bid to buy market share….we’ve taken a powerful innovator and strong IBM competitor out of the market.”

The software development industry typically pendulums on modeling tools: excess, backlash, abandonment, code is king, frustration, some modeling helps, we can model everything, excess …

Right now, modeling is not popular. But I think it’s actually passed its nadir and, if history holds, we should see modeling increase in popularity. The problem for IBM and Rational is that part of the pendulum is the embrace of new modeling graphics/languages.