Archive for the ‘SD Futures’ Category.

How Much of the Industry Will Go Parallel?

Michael Seuss ponders one of my favorite questions: How much of the software industry will have to deal with the concurrent computing [opportunity]? He hits the vital points:

  • 2, 4, and maybe 8 cores may be usefully exploited by system services (anti-virus, disk indexing and searching, etc.), but when you get beyond that, any program for which performance is any kind of issue simply cannot ignore the capacity (this is why I distinguish between our current “multicore” transitional phase and the coming “manycore” era).
  • Media programming (games, A/V processing) have an essentially infinite appetite for processing
  • The manycore era provides an opportunity for new types of functionality. He mentions concurrent semantic analysis of your input, both typing and spoken, and the accumulation of context documents. For instance, as I type this, my computer might be gathering all my blog posts, OneNote notes, source code, etc. relating to concurrency. (And then wouldn’t it be cool if it offered them for my perusal, maybe with, I dunno’, a goggle-eyed paperclip?).

But I think the $64 question is whether such services will be provided in a service-oriented, cross-application manner, or whether it will be the case that we find broad opportunities for them within applications. For instance, mail programs and word processors have had search functionality for a long time, but if you were designing such a program from scratch, you would probably be better advised to say “Hey, I won’t implement a complete search subsystem, I’ll just make sure I can be indexed by Windows and Google Desktop Search. If I want to add value, I’ll layer on top of those systems if at all possible.”

Conversely, if you had some powerful new value proposition (semantic analysis, task recognition, visual input), wouldn’t it be vastly better for you and your customers if you could provide it to applications other than those that you happen to have written? In other words, of course value in the manycore era will derive from increased parallelism but maybe that parallelism will still be very coarse-grained. Maybe software organizations will face a choice: “Either develop client-oriented value with the best practices of “traditional” non-parallel development or develop broader, system-oriented value using whatever is the emerging set of best practices for system-level parallel development.” Maybe that choice will become increasingly orthogonal.

Now, the final part of the thought experiment is this: if that scenario is reasonable, what kind of platform services / APIs would one desire?

Microsoft’s Popfly: Getting Their Ducks In A Row

Popfly is the name (and URL) of Microsoft’s new non-professional developer community, a Windows Live site whose flashiest feature is a Silverlight-based “mashup editor” that facilitates pipes-and-filters development. Before reviewing the gratuitous 3-D spinning cubes, though, pay attention to the context:

  • Visual Studio Express has had 14,000,000 downloads (source: Dan Fernandez personal communication). Of course that translates to something far less than 14M users, but it ain’t hay;
  • The Popfly mashups run inside Silverlight, so anyone wishing to view their friends’ / child’s / grandkid’s project is going to have to install the Silverlight runtime;  
  • Silverlight is going to rapidly evolve to incorporate the Dynamic Language Runtime. Silverlight + CoreCLR + DLR == Microsoft’s platform play for dynamic languages, which have crossed the chasm and, whatever their other strengths or weaknesses, are easier to learn than explicitly typed or Pascal-like highly-structured languages

Microsoft is on the verge of restoring the bridge between power users and programmers.

The collapse of that bridge — the disappearance of macro-based automation during the DOS-Windows transition and the removal of Hypercard from the Mac — was the greatest setback the professional programming community has ever suffered (#insert COBOL or C++ joke here#).

Pipes-and-filters mashups are the UNIX shell-commands of the Web. The next step is automation — after you start figuring out how to pipe commands, you start writing shell scripts, at which point you’re programming the platform at a higher abstraction level. That’s a crucial point: we’re not talking about flow-control and manipulation within the pipes-and-filter components, but at the platform level. That’s why it’s huge that the Popfly mashups are executed on the client (within Silverlight) and not on the CPUs of the host. Mainframes->Minis->PCs: empowered users require and embrace personal resources. This is the salient distinction between Popfly and Yahoo Pipes (Popfly also works with more types of data, but Yahoo could address that). It’s not just that there’s a resource-consumption scaling problem that might be solvable by the host absorbing hardware costs, it’s that there’s a Big O scaling problem: to the extent that mashups are used to program the Web, as soon as people start looping/recursing, you’re talking about non-linear increases in resource consumption.

To be clear, I don’t think Popfly is the Bourne Shell of the Web — that hasn’t been written yet. But I think Popfly’s the | and Silverlight’s the $

IronPython, IronRuby Discussion with Jim Hugunin and Jon Lam

I’m dying because I’ve just had a long talk with two of Microsoft’s heavy hitters on the Dynamic Languages Runtime (DLR) team and have much to discuss, yet I am in a frenzy preparing for a business trip and cannot yet take the time to do the discussion any kind of justice.

The single-most important quote, I think, was the statement that “no one will take [our implementations] seriously until we can run– / We aren’t done until we can run–” [Django | Rails]. That was contrasted with important libraries that were heavily dependent themselves upon C-based libraries (Zope, in particular). It was also contrasted with libraries that rely on unusual language quirks or implementation details; the touchpoint on that was Ruby’s … shoot, I thought Lam said “objectspaces” but I don’t see that in the standard library … maybe he said “ObjectSpaces-like ability to traverse the entire in-memory object graph” (Anyone know what lib that would be?) … Anyway, the point was that this was an example of something that would be very difficult to implement within the constraints of the CLR.

I’ll update this entry when I can report in more detail…

Sun’s Fortress Language : Looks Very Well Designed

This is a rather daunting (124 slide) PDF on Sun’s “Fortress” programming language, designed in large part by Guy Steele, which is designed for scientific / mathematical programming. It looks really good — lots of good decisions (take advantage of Unicode, traits and objects, implict and explicit parallelism… well, actually, making parallelism the default for loops is a mistake…).

I do sometimes second-guess myself about whether concurrency is going to be a mainstream concern or whether taking advantage of 90% of your computer’s power (once you get to more than 10 cores) is going to be a niche problem. My gut tells me that mainstream programmers cannot ignore that much of a discrepency in performance; performance is always an issue and, even though the majority of performance problems are not CPU-bound, I just feel that no one will want to say “Yeah, it’s single-threaded” when the pointy-haired boss is looking for someone to blame for performance woes on a 16-core machine.

Found by way of James Governor

Thread Creation Overhead Can Trip Up Pros

 Michael Seuss has a good blog piece on parallelizing code that contains loop-carried dependenciess, which is to say, code such as the following, where the calculation in one pass is dependent on a previous pass’ calculation. The moral of the story, though, is that even when run to the point where the doubles start to overflow (i.e., at the upper limits of the code’s capability), the overhead of creating threads turns out to be an order of magnitude slower than the non-parallel version! (And this in code submitted by boffins on the OpenMP mailing list.)

This is a great example of why neither of the simplistic approaches to parallelization (“everything’s a future” or “let the programmer decide”) will ultimately prevail and how something akin to run-time optimization (a la HotSpot) will have to be used.

PLAIN TEXT

C:

  1. const double up = 1.1 ;

  2. double Sn=1000.0;

  3. double opt[N+1];

  4. int n;

  5. for (n=0; n<=N; ++n) {

  6.   opt[n] = Sn;

  7.   Sn *= up;

  8. }

Dynamic Language Runtime: PHP, Scheme, and &quot;maybe one more&quot; coming

The preliminary documentation for the DLR is included in the IronPython-2.0A1-Doc.zip download at http://www.codeplex.com/IronPython/Release/ProjectReleases.aspx?ReleaseId=438 :

We’re leveraging the learning we did on IronPython to extract elements that could be common amongst languages (dynamic type system, hosting APIs, cached method dispatch, symbol tables, ASTs, codegen, etc.), and we currently are working on IronPython 2.0, JScript, VBX, and Ruby to vet the common designs. We’ll eventually do (or recruit people to do) PHP, scheme, and maybe one more to believe we really can move language+1 with ease to the DLR.

…. Another juicy quote …

That runtime needs a great scripting story and UIFx story to compete with the virtuous cycle Flash/EcmaScript enjoy. We also watn to work with a partner on an IDE to eventually make a play for the MS app programmability story so that you can have your choice of dynamic language on a little runtime for scripting, say, Office or VS.

Microsoft CLR Boffin: “immutability and isolation are two things that all modern type systems should support (and encourage use of!) in a 1st class way”

Michael Suess’s fantastic series of brief interviews with concurrence gurus concludes with Microsoft’s Joe Duffy (whose own blog is must-reading for performance-oriented windows developers”>)contains the intriquing quote in the title. I smell CLR support!

Silverlight on Rails

Jonathan Edwards has a great piece of speculation.  Man, if John Lam has produced a native CLR Ruby (maybe based on the IronPython codebase) in 8 months, he’d be the run-away winner of this year’s He-Man Programming Award.

Ageism in Software Development

Benoit Lavigne wonders if ageism is a problem in the software development profession. Oh, hell yeah. From the minute I began editing software development magazines (when I was 25) I began hearing from professionals in their 40s and higher who faced disproportionate difficulty getting work. There is not a question in my mind that this is a real problem. True, this is a field that is unforgiving to those who don’t keep their skills current, but I’ve heard far too many stories to believe that’s the only, or even dominant, factor.

Now that I have a touch of gray around the temples myself, I worry about this myself. I’m the oldest person on my programming team right now and I’m at least two decades away from retiring. I have no doubt that it will be harder and harder for me to get work as a developer, no matter how current my coding skills stay. If I’m on the phone with a potential client and they ask about my experiences, I don’t say “Professional programmer for 27 years,” because I think that could very well trigger ageism; I say “I sold my first program when I was 16.”

I fear the day when I’m so old that the only work I’ll be able to get will be drawing lines between boxes and pretending I’m delivering value.

Map, Everything’s An Object, and Inline

 One of the reasons that functional programming is worth studying is that it abounds with opportunities for implicit parallelization. As Jon Harrop discusses in this post, the map function takes a function object f and an array [a, b, c, d] and returns [f(a), f(b), f(c), f(d)] (syntax varies from language to language, of course, but you get the point).

The optimist sees this and says “Ah hah! The compiler can simply distribute these calculations to a thread-pool and have a performance advantage on a manycore machine.” And this is true if (a) f is quite lengthy or (b) the array is quite large. Otherwise, the overhead of distributing the calculation across cores / processors can very well be greater than performing the map “in core.” In the worst case, when function and data are already inside the initial core’s cache, the performance hit for distributing it would be very substantial.

This is a familiar theme in programming languages: a theoretical capability runs afoul of implementation realities. The best design decision in Java was “(Almost) Everything’s an object”: numbers and strings — the most commonly used data types — have different semantics (what the .NET world calls “value semantics”) because they aren’t pure objects. And they aren’t pure objects for performance reasons (immutable strings are also good for a couple other reasons). To this day, you can feel an occasional performance hit with pure object-oriented languages (before you flame, keep in mind that I’m about to deploy a Ruby-based service right into the middle of a live multimillion-dollar application and Ruby’s not only pure objects, it’s interpreted. So I’m not one of those who doesn’t understand that performance, productivity, and responsiveness are different things).

Another situation is C/C++’s inline keyword. Structured programming theory tells us not to repeat ourselves — to define functions rather than writing the same code in multiple places. But in the not-terribly-distant past, the cost of a function call was large relative to local operations (a situation that holds today with some embedded processors). So C and C++ have an inline keyword to say “don’t generate a function call for this, generate the code inline.” But unlike Java’s success with “Everything’s an object: almost” the inline keyword turned out to be pretty much a disaster. Now, to this day some people doing embedded systems undoubtedly use inline to great effect. But most developers do a poor job estimating the benefit of the inline keyword. Because, just as distributing map can be counter-productive, inlined code can decrease performance (the on-chip caches of modern processors make code size and data locality very important to performance). (And don’t even get me started on template metaprogramming.)

So, what does this history suggest for the manycore era?

  • Languages that promise that the solution is “every call’s distributed” (or other “pure” approach) will either fail outright to deliver performance gains or will require very sophisticated just-in-time code generators (this is similar to the situation with pure object-oriented languages such as LISP and Smalltalk, where commercial VMs are leaps and bounds beyond academic “proof of concept” interpreters / VMs). The problem with this is that the development of sophisticated JITters requires time and experience, so “the language takes care of it” solutions face a very big chicken-and-egg problem.
  • Languages that shift 100% of the burden of parallelization to the programmer (a ParallelAttribute that can be applied to blocks or functions, say) will work in the hands of experts but will be disasters in the hands of the mainstream.
  • Some kind of hybrid approach that purists decry as tainted but that solves 80% of the problem (a la Java’s “(Almost) Everything’s an object”) will be the winner.

(No, I don’t know what the hybrid approach will be.)