Saturday, February 02, 2008 |
|
|
Correcting myself re. concurrency at Lang.NET, there was one "sit up and take notice" discussion. Erik Meijer briefly discussed that he and his team have implemented (or at least prototyped) MapReduce for LINQ. "We call it MapReduce for Datacenters or somesuch..." I asked him to clarify if this a part of Volta or a separate project or been discussed and he said they hadn't talked about it yet. He made it sound like it was something they did in their copious spare time. During the talk, he'd said "I come into work every day and hack. Who would want a job like mine?" to which 45 people in the room had to bite their lips to prevent themselves shouting "Me!" |
|
|
|
|
Friday, February 01, 2008 |
|
|
Although concurrency was laid out by Jason Zander as one of the overarching themes of language work moving forward, it was not at all emphasized as a primary concern in any of the talks I saw. There was some talk about language features that a "sufficiently smart compiler" could handle but no one took off their shoe and banged the table and said "We must focus on this!" I button-holed Erik Meijer and Brian Goetz on the topic: Erik is a great believer in Software Transactional Memory and Brian is "cautiously optimistic" about it. Both were very quick to acknowledge that we don't really know how people will react to the complexities that emerge when the behavior of memory transactions starts to necessarily diverge from the familiar world of database transactions. When I mentioned the Actor model as being intuitive, Brian astutely diagnosed me as having a Smalltalk background and said that message-passing would be confusing to a population that viewed objects as -- I think his phrase was -- "glorified structs." |
|
|
|
|
Sunday, December 30, 2007 |
|
|
Raymond Chen's psychic debugging of a deadlock is everything you need to know about why the mainstream model of concurrency (in which programmer's manually manage locks and can start their own threads) is fundamentally broken. If you're a C# or Java programmer looking at this code, you might be tempted to throw the bozo bit and say "Yech! C!" but this is precisely the same situation that one can will see in any kind of complex, multithreaded application. Sure, Raymond Freakin' Chen can quickly debug such situations, but most of us don't have Raymond Freakin' Chen on staff. And no matter how gently Chen tries to show us how easy it is, most of us simply don't have the capacity to develop rapid, accurate intuitions into the cause of problematic thread behavior in this model. And even if such capacity were widespread, the discipline of never doing any form of external calling (message sending, virtual function calls, invoking callbacks, etc.) while holding a lock is never going to be universal and, so long as it's not universal, this type of problem is inevitable. |
|
|
|
|
Friday, September 21, 2007 |
|
|
Intel's announced the availability of a prototype C++ compiler implementing Software Transactional Memory, one of the central topics of this brilliant .NET Rocks show. That's exciting, although I suspect that on just two cores STM will be disappointing. |
|
|
|
|
Tuesday, September 11, 2007 |
|
|
I'm not one of those people for whom the task of composition is a somewhat mechanical extension of increasingly detailed outlines. Nonetheless, research consumes between 80-90% of the time I spend on a technical article (note I said time and not effort -- subjectively, writing text is two or three times harder than writing C++. You can satisfy a compiler.) In this case, research will consist of: - Integrating CodeAnalyst and Eclipse
- Profiling a simple application
- Implementing an image-processing application
- Profiling it to discover a performance problem
- Investigating that problem with CodeAnalyst
- Ameliorating it
- Evaluating the performance "win"
- If not "big enough" go to 3
In this case, my target sample application has two algorithms that I think will be difficult: - Seam-discovery (CPU intensive)
- Seam-removal (Memory-manipulation intensive)
With seam-discovery, I know that even if I implement a known-good algorithm (such as graph-cuts), I am guaranteed to peg the CPUs. My big gamble professionally is that I'm a good enough programmer to find a Step 6 "win" that I can pull off in a reasonable amount of time. A safer bet would be to choose an algorithm that would have performance problems if intentionally implemented in a naive manner (for instance, a filter that just iterated across rows and columns and, at each pixel, retrieved the neighboring pixels and averaged them. Pretty easy to tune that up!) Now, a confession. By the time you read this, I've been somewhere not-Hawai'i for a week or so. I'll carry print outs of this paper on content-aware image resizing and this paper on graph-cuts and study them on the plane tomorrow. I'll have my laptop with me and may noodle around with source code, but it's far more likely that I'll be working with pen and paper to see if I can "get" the algorithm intuitively. Other than that, I'm going to do my darnedest not to spend too much time thinking about computers. Coming September 19th or thereabouts: Part 4, Research Gets Underway... |
|
|
|
|
Thursday, September 06, 2007 |
|
|
Probably the biggest difference between academic and commercial writing is this: academic writing is almost always concerned with algorithms and process, commercial writing is almost always driven by a specific set of technologies. In this case, I'm being paid by AMD to explain integration of CodeAnalyst and Eclipse CDT. So that dictates that I'll be using C++, which leads to a couple questions: - Linux or Windows?
- What C++?
The Linux or Windows question is fairly easy for me to answer: I boot into Windows and run Linux in Virtual Machines. For an article on a profiler, the VM is a complication -- CodeAnalyst works within a VM, of course, but since I expect this article to get down to the hardware, I don't want any confusion about how toasty-warm my code is making my Opterons. So I'll be experiencing the code in Windows. But when writing about Eclipse, surely I should imagine that my target audience is not strictly Windows-based. So it seems logical to set up a compiler-chain based on GNU C++: I know that will be portable. (Note: I'll probably cross-compile with at least Microsoft C++ and compare the experience. That might give me enough information for, if not an article, at least a column.) Since the article is supposed to emphasize multicore/multichip development and in C++, it's a no-brainer that I'll be using OpenMP. My next thought is unit-testing. For one thing, I just don't develop anything without unit-testing anymore. For another, repeatability is critical to the article. For a third, I expect to hit some hard problems along the way and unit-testing is the best way to maintain momentum in the face of brain-twisters. I have not compared C++ unit-testing frameworks, but Boost has a test component, and while there may well be better, I know that Boost won't be poor. Version control is not the same issue in an article that it is in professional development, but you can never over-estimate the benefit of a backup: I'll be using Subversion against a local server. Since I know that I'll be doing image-processing, I'd just as well use a library or two to handle file-loading and basic pixel manipulation. If I were building an application, I'd definitely tend towards GTK+. However, I want to keep things as simple as possible around the algorithm(s) that I'll be developing, so I'm going to see how far I can get with this project, CImg, which seems nice and minimal. So my initial tool list is: - CodeAnalyst : Profiler
- Eclipse CDT : IDE
- GCC : Compiler-chain
- OpenMP : Shared-memory parallel programming
- Boost.Test : Unit-testing
- Subversion : VCS
- CImg : Image-processing library
If I need or change libraries, I'll make mention of it in forthcoming posts. Next: Part 3, Research Begins... |
|
|
|
|
Monday, September 03, 2007 |
|
|
If you are interested in concurrent programming, you'll eventually come across the Byzantine Generals problem. Mark Nelson has a good post explaining it. |
|
|
|
|
Saturday, September 01, 2007 |
|
|
The key to content-aware image resizing (and other recent image algorithms) is the discovery of "seams" -- paths of pixels that don't differ much from each other. Such seams are places where the picture can be "pulled apart" gracefully. In image resizing, the seam is discarded and the two remaining pieces glued together -- when done time and time again, the result is the astonishing resizing behavior. In the image in this post, a blown-up anti-aliased letter "R", an obvious seam exists straight down the left side. A not-quite-as-good seam might be along the right side, but cutting diagonally back across the tail of the "R." Other people have used seams to automate collaging and masking. All very interesting, but the point of this post is to talk very briefly about the algorithms used to generate the seams. Optimal seam generation is NP-hard. So it boils down to what works efficiently. Graph-cuts are the preferred method, but I just can't get out of my head that the simple path-finding method called A* might be effective. The thing that is nagging at me is that A* uses a cost approximation to figure out the next step forward; it seems to me that when manipulating many seams, using approximations that might last for dozens of seams would have a big benefit. The other obvious thing about A* is that the approximations are easy to build: recursively develop lower-and-lower resolution images. For instance, when the above image is converted into brightness values, you can average neighbors to create the values shown in yellow, and average those to create the values in red. (Did you think "Haar wavelet"?) Initially, this looks promising (the left-most yellow column is a clear winner), but at coarse levels, it's deceptive (For the red, lowest-resolution "image," you would clearly be drawn towards the right side for initial processing. Intuitively, you would think that most images would have several side-by-side seams (i.e., that would show up at coarser levels). Also, after seam removal, you would be able to "mark dirty" the approximations (i.e., the coarser calculations) and trigger recalculation as the amount of removed seams became significant (and, obviously, not recalculate areas through which the seam did not pass). This is kind of killing me, because I don't have time to explore an algorithm that I only suspect will be efficient. Grrr.... |
|
|
|
|
Friday, August 24, 2007 |
|
|
The working groups of the C++0x committee are working hard to complete a major new standard for C++ (there's a big meeting here in Kona in October). If you're not intimate with C++, you may be surprised that such an important language has not had a standard threading model and that such a model is a major part of the C++0x version. This is actually part-and-parcel of the design philosophy that made C and C++ so important: the number of libraries dictated by the standard for C and C++ is much smaller than the BCL or Java's SE libraries. This allows standard C and C++ to be available for hundreds of processors. I recently read the public C++0x papers on threading (links below). The proposed threading model is non-radical and is based on Boost.Thread. The reasonable perspective is that this is a conservative decision thoroughly in keeping with C/C++'s long tradition of minimal hardware/OS assumptions. The emotional perspective is that they've let slip by a golden opportunity to incorporate the best thinking about memory models. "Multi-threading and locking" is, I would argue, demonstrably broken for business programming. It just doesn't work in a world of systems built from a combination of libraries and user-code; while you can create large applications based on this model, large multithreaded applications based on locking require not just care, but sophistication, at every level of coding. By standardizing an established systems-level model, C++0x foregoes an opportunity for leadership, albeit radical. One of the real thought leaders when it comes to more sophisticated concurrency semantics is Herb Sutter. His Concur model (here's a talk on Concur from PDC '05) is, I think, a substantial step forward and I've really hoped to see it influence language design. Is Sutter, though, just an academic with flighty thoughts and little understanding of the difficulties of implementation? It seems unlikely, given that he's the Chair of the ISO C++ standards committee. So you can see why there might have been an opportunity. Multithreading proposals for C++0x: |
|
|
|
|
Thursday, August 23, 2007 |
|
|
Looks like the only programming tools for Tilera's 64-Core CPU is a C compiler, but the day is fast approaching when we're going to start seeing more and more of these types of tools in the mainstream. |
|
|
|
|
Wednesday, August 01, 2007 |
|
|
The owners of Patent 5,056,000 have whacked Sony with a suit, on the basis that the Cell processor in the PS3 is a violator. The claims seem center on the idea of an "interconnection switch" that locks the access of a single processor to a particular bank of shared memory. Speculation is that this very broad patent could be the basis of claims against many vendors. |
|
|
|
|
Wednesday, July 25, 2007 |
|
|
The very good Threading Building Blocks library from Intel, released last year around this time as 1.0 and being updated soon, has been open-sourced by Intel. This is a hardcore C++ template library, but has some great-looking libraries and algorithms (lots of lock-free data structures). I've been unable to actually use the library, as my multicore system is AMD Opteron-based (just because I live in the tropics doesn't mean I can't appreciate an even warmer room). With quad-core systems available under $1000 and the Q6600 now at $375 from Newegg, there's a great temptation, but I'm going to try to resist until I can build an 8-core machine, which to my mind is the inflection point from "multicore" to "manycore." |
|
|
|
|
|
We're hiring developers and testers for the Parallel Computing team in Microsoft's Developer Division....[Contact] joedu at microsoft dot com if you are interested. |
|
|
|
|
Wednesday, July 18, 2007 |
|
|
I figure I'll be the #1 programming blog by the end of the month. |
|
|
|
|
Tuesday, July 17, 2007 |
|
|
Maybe the readers of my blog are more astute (and better looking!) than average, but I was happy that several comments to my recent post on type inference were properly dismissive of what one called "the static vs. dynamic holy war." As I said when writing about the myth of better programming languages last year, different programming languages engage your mind in different ways and that is what is worth pursuing. There was a time when I was programming professionally in two languages: C and Prolog. They engaged my mind in such profoundly different ways that shifting between them felt like the clutch on the '77 Ford van I was driving at the time (three-on-the-tree, baby), but in terms of problem solving, I felt like Superman. Now, first-class functions have entered the mainstream (primarily via C#) and that, in combination with an influential paper about Google's MapReduce programming model has led people to begin to see what functional programming advocates have been talking about lo these many years. Similarly, people are beginning to realize that concurrency models just might be important in the coming years and are beginning to pay attention to languages like Erlang. (Incidentally, O'Reilly & Associates seems to be betting that "shared nothing" is the way to go, a conclusion that I think is certainly too sweeping and premature. ORA is the most influential publishing house in software development right now, so the biases of their editors in this area will have a noticeable impact on the debate in the years to come.) Update: No sooner had I written this post when I see in my Inbox that Pragmatic Bookshelf has published Programming Erlang. Look for a review in the coming weeks... |
|
|
|
|
Tuesday, July 03, 2007 |
|
|
Since LiveWriter Beta seems to not delay publishing until the "Set publish date" time (it just time-stamps them later, which is just annoying), I will be going dark for a few days of travel. In the meantime, let me quickly say that this business about Intel's "Larrabee" GPU being based on the Pentium MMX is almost certainly hogwash. Jon Stokes has it right I think. |
|
|
|
|
Saturday, June 23, 2007 |
|
|
NVidia's Tesla C870 Graphics Processing Unit (GPU) will be the basis for a "deskside supercomputer" add-on that will provide highly-parallel high performance computing (HPC) capabilities, presumably programmed with NVidia's CUDA toolkit. Dedicated hardware for HPC has always been a treacherous market -- one year's darling is next year's has-been (people used to buy Cray Supercomputers at auction and resell them for the gold in the connectors. True story.). Dedicated processing boards for desktop computers have always been especially troubled, as the system bus is such a bottleneck and Moore's Law used to provide such wonderful free lunches. (No longer true, although the bus issue is potentially more dramatic than ever.) There is infinite demand for HPC from 3 well-funded sectors: economics (trading), bioinformatics, and chemistry (bio- and otherwise). These sectors will absorb any amount of information processing capacity available. Whether that can be translated into commercial success for NVidia, or whether they unlock additional markets, is far less certain. I wonder if Google will buy a couple boards. Takeaway for programmers: Feverish hardware activity relating to concurrency continues. Software lags, with only relatively low-level toolkits available for exploiting the system. Keep your C skills sharp. |
|
|
|
|
Friday, June 15, 2007 |
|
|
I hesitate to call Mark McKeown's Brief History of Consensus, 2PC, and Transaction Commit (via just about everyone, but let's say Bill de hÓra) a "blog post." It reads much more like a darn good professional article. If you're interested in having an informed opinion about concurrency (as opposed to waiting half-a-decade and accepting what the market has decided is "good enough"), the article is a must-read. As we get to the manycore era, the amount of asynchronicity within a single machine will be significant. We're already seeing hints of this with Non-Uniform Memory Access (NUMA), in which different cores can access main memory asynchronously. So in order to think knowledgeably about what concurrent programming ought to be like, the best crude model is distributed programming. With a huge caveat, which is that you can't think just about the network messages as "the system," you have to think about keeping the local processor busy. So don't think about Google Maps, think about Forza 2 on XBox Live. (Mmmm... Forza 2 on XBox Live ... ) |
| |
| | |