Archive for the ‘Concurrency’ Category.

C# and VB Compilers Being Rewritten in Themselves; “Immutable” attribute coming?

Darryl Taft interviews Luca Bolognese, Group Program Manager for languages at MS, and comes up with some interesting hints.

The lede revolves around MS’ plans to make the compilers more “open” (as in an open can of beer) by providing a “compiler-as-a-service” facility. On face value, “compiler-as-a-service” doesn’t make much sense — compilers aren’t large and I don’t perceive parallel make of C# programs being a grave problem. It’d be lovely to have Visual Studio in the cloud, though…

Apparently they’re rewriting the C# and VB compilers in themselves, which is interesting, if for no other reason than the performance characteristics coming out of the managed side of things.

The buried lede revolves around concurrency and code words like “declarative” and “immutability.”  The hint is that Microsoft sees some big wins coming from the low-hanging fruit of declaring that a variable is single-assignment. (If you know that a variable doesn’t change its value over time, you can use it’s value in different threads without worry.)

What’s interesting about this is not that single-assignment has some advantage (we knew that), it’s the hint that Microsoft sees enough advantage to mention it as a specific optimization. It’s not obvious (to me) that such an attribute would have profound effects in the real world, in that a real application would have a mix of mutable and immutable variables with complex dependencies. So to get a meaningful speedup presumably requires some far-from-trivial reasoning at compile and/or runtime.

LINQ for Datacenters: Microsoft does MapReduce

Correcting myself re. concurrency at Lang.NET, there was one “sit up and take notice” discussion. Erik Meijer briefly discussed that he and his team have implemented (or at least prototyped) MapReduce for LINQ. “We call it MapReduce for Datacenters or somesuch…”

I asked him to clarify if this a part of Volta or a separate project or been discussed and he said they hadn’t talked about it yet. He made it sound like it was something they did in their copious spare time. During the talk, he’d said “I come into work every day and hack. Who would want a job like mine?” to which 45 people in the room had to bite their lips to prevent themselves shouting “Me!”

Concurrency Not Emphasized at Lang:NET

Although concurrency was laid out by Jason Zander as one of the overarching themes of language work moving forward, it was not at all emphasized as a primary concern in any of the talks I saw. There was some talk about language features that a “sufficiently smart compiler” could handle but no one took off their shoe and banged the table and said “We must focus on this!”

I button-holed Erik Meijer and Brian Goetz on the topic: Erik is a great believer in Software Transactional Memory and Brian is “cautiously optimistic” about it. Both were very quick to acknowledge that we don’t really know how people will react to the complexities that emerge when the behavior of memory transactions starts to necessarily diverge from the familiar world of database transactions. When I mentioned the Actor model as being intuitive, Brian astutely diagnosed me as having a Smalltalk background and said that message-passing would be confusing to a population that viewed objects as — I think his phrase was — “glorified structs.”

Why The Mainstream Concurrency Model is Broken

Raymond Chen’s psychic debugging of a deadlock is everything you need to know about why the mainstream model of concurrency (in which programmer’s manually manage locks and can start their own threads) is fundamentally broken.

If you’re a C# or Java programmer looking at this code, you might be tempted to throw the bozo bit and say “Yech! C!” but this is precisely the same situation that one can will see in any kind of complex, multithreaded application.

Sure, Raymond Freakin’ Chen can quickly debug such situations, but most of us don’t have Raymond Freakin’ Chen on staff. And no matter how gently Chen tries to show us how easy it is, most of us simply don’t have the capacity to develop rapid, accurate intuitions into the cause of problematic thread behavior in this model.

And even if such capacity were widespread, the discipline of never doing any form of external calling (message sending, virtual function calls, invoking callbacks, etc.) while holding a lock is never going to be universal and, so long as it’s not universal, this type of problem is inevitable.

Software Transactional Memory C++ Compiler from Intel

Intel’s announced the availability of a prototype C++ compiler implementing Software Transactional Memory, one of the central topics of this brilliant .NET Rocks show.

That’s exciting, although I suspect that on just two cores STM will be disappointing.

Writing A Technical Article, Pt. 3: Research Begins…

I’m not one of those people for whom the task of composition is a somewhat mechanical extension of increasingly detailed outlines. Nonetheless, research consumes between 80-90% of the time I spend on a technical article (note I said time and not effort — subjectively, writing text is two or three times harder than writing C++. You can satisfy a compiler.)

In this case, research will consist of:

  1. Integrating CodeAnalyst and Eclipse
  2. Profiling a simple application
  3. Implementing an image-processing application
  4. Profiling it to discover a performance problem
  5. Investigating that problem with CodeAnalyst
  6. Ameliorating it
  7. Evaluating the performance “win”
  8. If not “big enough” go to 3 

In this case, my target sample application has two algorithms that I think will be difficult:

  1. Seam-discovery (CPU intensive)
  2. Seam-removal (Memory-manipulation intensive)

With seam-discovery, I know that even if I implement a known-good algorithm (such as graph-cuts), I am guaranteed to peg the CPUs. My big gamble professionally is that I’m a good enough programmer to find a Step 6 “win” that I can pull off in a reasonable amount of time. A safer bet would be to choose an algorithm that would have performance problems if intentionally implemented in a naive manner (for instance, a filter that just iterated across rows and columns and, at each pixel, retrieved the neighboring pixels and averaged them. Pretty easy to tune that up!)

Now, a confession. By the time you read this, I’ve been somewhere not-Hawai’i for a week or so. I’ll carry print outs of this paper on content-aware image resizing and this paper on graph-cuts and study them on the plane tomorrow. I’ll have my laptop with me and may noodle around with source code, but it’s far more likely that I’ll be working with pen and paper to see if I can “get” the algorithm intuitively. Other than that, I’m going to do my darnedest not to spend too much time thinking about computers.

Coming September 19th or thereabouts: Part 4, Research Gets Underway…

Writing A Technical Article, Part 2: Gathering Tools

Probably the biggest difference between academic and commercial writing is this: academic writing is almost always concerned with algorithms and process, commercial writing is almost always driven by a specific set of technologies. In this case, I’m being paid by AMD to explain integration of CodeAnalyst and Eclipse CDT. So that dictates that I’ll be using C++, which leads to a couple questions:

  • Linux or Windows?
  • What C++?

The Linux or Windows question is fairly easy for me to answer: I boot into Windows and run Linux in Virtual Machines. For an article on a profiler, the VM is a complication — CodeAnalyst works within a VM, of course, but since I expect this article to get down to the hardware, I don’t want any confusion about how toasty-warm my code is making my Opterons. So I’ll be experiencing the code in Windows.

But when writing about Eclipse, surely I should imagine that my target audience is not strictly Windows-based. So it seems logical to set up a compiler-chain based on GNU C++: I know that will be portable. (Note: I’ll probably cross-compile with at least Microsoft C++ and compare the experience. That might give me enough information for, if not an article, at least a column.)

Since the article is supposed to emphasize multicore/multichip development and in C++, it’s a no-brainer that I’ll be using OpenMP.

My next thought is unit-testing. For one thing, I just don’t develop anything without unit-testing anymore. For another, repeatability is critical to the article. For a third, I expect to hit some hard problems along the way and unit-testing is the best way to maintain momentum in the face of brain-twisters. I have not compared C++ unit-testing frameworks, but Boost has a test component, and while there may well be better, I know that Boost won’t be poor.

Version control is not the same issue in an article that it is in professional development, but you can never over-estimate the benefit of a backup: I’ll be using Subversion against a local server.

Since I know that I’ll be doing image-processing, I’d just as well use a library or two to handle file-loading and basic pixel manipulation. If I were building an application, I’d definitely tend towards GTK+. However, I want to keep things as simple as possible around the algorithm(s) that I’ll be developing, so I’m going to see how far I can get with this project, CImg, which seems nice and minimal.

So my initial tool list is:

  • CodeAnalyst : Profiler
  • Eclipse CDT : IDE
  • GCC : Compiler-chain
  • OpenMP : Shared-memory parallel programming
  • Boost.Test : Unit-testing
  • Subversion : VCS
  • CImg : Image-processing library

If I need or change libraries, I’ll make mention of it in forthcoming posts.

Next: Part 3, Research Begins…

Good Post on the Byzantine Generals Problem

If you are interested in concurrent programming, you’ll eventually come across the Byzantine Generals problem. Mark Nelson has a good post explaining it.

Seam Generation Using A*

The key to content-aware image resizing (and other recent image algorithms) is the discovery of “seams” — paths of pixels that don’t differ much from each other. Such seams are   places where the picture can be “pulled apart” gracefully. In image resizing, the seam is discarded and the two remaining pieces glued together — when done time and time again, the result is the astonishing resizing behavior.

imageIn the image in this post, a blown-up anti-aliased letter “R”, an obvious seam exists straight down the left side. A not-quite-as-good seam might be along the right side, but cutting diagonally back across the tail of the “R.”

Other people have used seams to automate collaging and masking. All very interesting, but the point of this post is to talk very briefly about the algorithms used to generate the seams.

Optimal seam generation is NP-hard. So it boils down to what works efficiently. Graph-cuts are the preferred method, but I just can’t get out of my head that the simple path-finding method called A* might be effective.

The thing that is nagging at me is that A* uses a cost approximation to figure out the next step forward; it seems to me that when manipulating many seams, using approximations that might last for dozens of seams would have a big benefit. The other obvious thing about A* is that the approximations are easy to build: recursively develop lower-and-lower resolution images.

For instance, when the above image is converted into brightness values, you can average neighbors to create the values shown in yellow, and average those to create the values in red. (Did you think “Haar wavelet“?)

image

Initially, this looks promising (the left-most yellow column is a clear winner), but at coarse levels, it’s deceptive (For the red, lowest-resolution “image,” you would clearly be drawn towards the right side for initial processing.

Intuitively, you would think that most images would have several side-by-side seams (i.e., that would show up at  coarser levels). Also, after seam removal, you would be able to “mark dirty” the approximations (i.e., the coarser calculations) and trigger recalculation as the amount of removed seams became significant (and, obviously, not recalculate areas through which the seam did not pass).

image image

This is kind of killing me, because I don’t have time to explore an algorithm that I only suspect will be efficient.

Grrr….

C++0x to Incorporate Standard Threading Model

The working groups of the C++0x committee are working hard to complete a major new standard for C++ (there’s a big meeting here in Kona in October). If you’re not intimate with C++, you may be surprised that such an important language has not had a standard threading model and that such a model is a major part of the C++0x version. This is actually part-and-parcel of the design philosophy that made C and C++ so important: the number of libraries dictated by the standard for C and C++ is much smaller than the BCL or Java’s SE libraries. This allows standard C and C++ to be available for hundreds of processors.

I recently read the public C++0x papers on threading (links below). The proposed threading model is non-radical and is based on Boost.Thread. The reasonable perspective is that this is a conservative decision thoroughly in keeping with C/C++’s long tradition of minimal hardware/OS assumptions.

The emotional perspective is that they’ve let slip by a golden opportunity to incorporate the best thinking about memory models. “Multi-threading and locking” is, I would argue, demonstrably broken for business programming. It just doesn’t work in a world of systems built from a combination of libraries and user-code; while you can create large applications based on this model, large multithreaded applications based on locking require not just care, but sophistication, at every level of coding. By standardizing an established systems-level model, C++0x foregoes an opportunity for leadership, albeit radical.

One of the real thought leaders when it comes to more sophisticated concurrency semantics is Herb Sutter. His Concur model (here’s a talk on Concur from PDC ’05) is, I think, a substantial step forward and I’ve really hoped to see it influence language design. Is Sutter, though, just an academic with flighty thoughts and little understanding of the difficulties of implementation? It seems unlikely, given that he’s the Chair of the ISO C++ standards committee. So you can see why there might have been an opportunity.

Multithreading proposals for C++0x:

div>