Archive for the ‘SD Futures’ Category.

IronPython 2.0 & Microsoft Research Infer.NET 2.2

 import sys import clr sys.path.append("c:\\program files\\Microsoft Research\\Infer.NET 2.2\\bin\\debug") clr.AddReferenceToFile("Infer.Compiler.dll") clr.AddReferenceToFile("Infer.Runtime.dll") from MicrosoftResearch.Infer import * from MicrosoftResearch.Infer.Models import * from MicrosoftResearch.Infer.Distributions import *  firstCoin = Variable[bool].Bernoulli(0.5) secondCoin = Variable[bool].Bernoulli(0.5) bothHeads = firstCoin & secondCoin ie = InferenceEngine() print ie.Infer(bothHeads) --> c:\Users\Larry O'Brien\Documents\Infer.NET 2.2>ipy InferNetTest1.py Compiling model...done. Initialising...done. Iterating: .........|.........|.........|.........|.........| 50 Bernoulli(0.25) 

Sweet

Fast Ranking Algorithm: Astonishing Paper by Raykar, Duraiswami, and Krishnapuram

The July 08 (Vol. 30, #7) IEEE Transactions on Pattern Analysis and Machine Intelligence has an incredible paper by Raykar, Duraiswami, and Krishnapuram. A Fast Algorithm for Learning a Ranking Function from Large-Scale Data Sets appears to be a game-changer for an incredibly important problem in machine learning. Basically, they use a “fast multipole method” developed for computational physics to rapidly estimate (to arbitrary precision) the conjugate gradient of an error function. (In other words, they tweak the parameters and “get a little better” the next time through the training data.)

The precise calculation of the conjugate gradient is O(m^2). This estimation algorithm is O(m)! (That’s an exclamation point, not a factorial!)

On a first reading, I don’t grok how the crucial transform necessarily moves towards an error minimum, but the algorithm looks (surprisingly) easy to implement and their benchmark results are jaw-dropping. Of course, others will have to implement it and analyze it for applicability across different types of data sets, but this is one of the most impressive algorithmic claims I’ve seen in years.

Once upon a time, I had the great fortune to write a column for a magazine on artificial intelligence and could justify spending huge amounts of time implementing AI algorithms (well, I think I was paid $450 per month for my column, so I’m not really sure that “justified” 80 hours of programming, but I was young). Man, would I love to see how this algorithm works for training a neural network…

30K application lines + 110K testing lines: Evidence of…?

I recently wrote an encomium to ResolverOne, the IronPython-based spreadsheet:

[T]heir use of pair programming and test-driven development has delivered high productivity; of the 140,000 lines of code, 110,000 are tests….ResolverOne has been in development for roughly two years, is written in a language without explicit type declarations, and is on an implementation that itself is in active development. It’s been brought to beta in a credible (if not downright impressive) amount of time despite being developed by pairs of programmers writing far more lines of test than application. Yet no one can credibly dismiss the complexity of 30,000 lines of application logic or spreadsheet functionality, much less the truly innovative spreadsheet-program features.

ResolverOne is easily the most compelling data point I’ve heard for the practices of Extreme Programming.

[Extreme Program, SD Times]

Allen Holub sees the glass as half-empty, writing:

I want to take exception to the notion that Python is adequate for a real programming project. The fact that 30K lines of code took 110K lines of tests is a real indictment of the language. My guess is that a significant portion of those tests are addressing potential errors that the compiler would have found in C# or Java. Moreover, all of those unnecessary tests take a lot of time to write, time that could have been spent working on the application.

I was taken aback by this, perhaps because it’s been a good while since I’ve heard someone characterize tests as evidence of trouble as opposed to evidence of quality.

There are (at least) two ways of looking at tests:

  1. Tools for discovering errors, or
  2. Quality gates (they’re one way — are they quality diodes?)

There’s no doubt that the software development tradition has favored the former view (once you’ve typed a line, everything you do next is “debugging”). However, the past decade has seen a … wait for it … paradigm shift.

The Agile Paradigm views change over time as a central issue; if it were still the 90s, I would undoubtedly refer to it as Change-Oriented Programming (COP). Tests are the measure of change — not lines of code, not cyclomatic complexity, not object hierarchies, not even deployments.

(Perhaps “User stories” or scenarios are the “yard-stick” of change, tests are the “inch-stick” of change, and deployments are the “milestone” of change.)

So from within the Agile Paradigm / COP, a new test is written that fails, some new code is written, the test passes — a one-way gate has been passed through, progress has been made, and credit accrues. From outside the paradigm, a test is seen as indicative of a problem that ought not to exist in the first place. The passing of the test is not seen as the salient point, the “need” for (i.e., existence of) the test is seen as evidence of low quality.

In true test-driven development, every pass fails at least once, because the tests are written before the code. What is perhaps not appreciated by those outside the Agile Paradigm, however, is that tests are written that one expects to run from the moment the relevant code is created. For instance, if one had fields for sub-total, taxes, and total, one would certainly write a test that confirmed that total = sub-total + taxes. One would also certainly expect that test to pass as soon as the code had been written.

As is often the case with paradigms, often just realizing that there are different mental models / worldviews in play is crucial to communication.

Update: This relates to Martin Fowler’s recent post on Schools of Software Development.

Microsoft’s StartKey: Computer Environment on a USB Stick. I’ve Experienced This Before and It’s Awesome

StartKey will be a technology that allows you to carry your Windows logon around on a USB keychain. Early reaction is mixed as to the value of this, but I loved something similar when I worked for a company developing software for Sun JavaStation network computers.

With JavaStation’s, you had a smartcard that you plugged in and, after 10 seconds or so, up would come your desktop. Since most of the time you work at your desk, most of the time this was not particularly valuable. But let me tell you — it was fantastic for meetings and presentations. No messing around with cables and display settings, no hand-waving when trying to describe an issue you were talking about when you happened to be on the other side of the office.

The difference is that the JavaStations were uniform hardware, too, and all your software lived on the server (which, it turned out at 7AM the morning of a major trade show, is a single point of failure). While you might have a good experience assuming that a random machine has Office on it (a smile creeps across Microsoft’s face), there would presumably have to be a solution for specialist software such as Visual Studio or Photoshop that could not be assumed to be local.

I would think the problem with that is that although memory sticks are probably getting capacious enough, the bus connection between the memory sticks and the main computer are going to be bottlenecks.

Lang.NET Videos Up

http://barrkel.blogspot.com/2008/02/langnet-symposium-videos-are-up-in-wmv.html

Recommendations :

http://langnetsymposium.com/talks/Videos/1-05 - Lively Kernel - Dan Ingalls - Sun.wmv

http://langnetsymposium.com/talks/Videos/2-01 - Newspeak - Gilad Braha - Cadence.wmv

http://langnetsymposium.com/talks/Videos/3-08 - Cobra - Chuck Esterbrook.wmv http://langnetsymposium.com/talks/Videos/3-09 - Intentional - Magnus Christerson.wmv

http://langnetsymposium.com/talks/Videos/3-00 - IronRuby - John Lam.wmv

http://langnetsymposium.com/talks/Videos/3-03 - Parsing Expression Grammars in FSharp - Harry Pierson.wmv

 

Aging is the New Working

I was reading PC Magazine’s 25th anniversary issue in which they have the evergreen “what will the future bring?” essays. I was struck by how much talk of medical stuff (nanobots, non-invasive diagnosis, ubiquitous this-and-that) there was. And then it struck me:

Boomers.

Just as they do with every damn thing, boomers define the mainstream concern as “What does this mean to me?” In the past 25 years (to take PC Mag’s benchmark) it went from work (what is technology about? Business productivity!) to family (what is technology about? HDTVs, Internet predators, and bluetooth-enabled minivans!) and now, of course, it will shift again.

What will technology be about for the 25 years? Getting old.

Just as you wish you’d written a spreadsheet program 25 years ago or Facebook 10 years ago (well, you would have been flushed away in the dot-com bust, but aside from that…), the thing to think about now are the killer applications for aging, whether that’s medical support, post-retirement money management, or Am I Wrinkly Or Not?

Shiny Buckshot Rather Than Silver Bullets

Wes Moise’s musings on Supercompilation led me to this discussion of the the myth of the sufficiently smart compiler.

The “sufficiently smart compiler” is still trotted out regularly, even though the market has moved away from demanding even moderate attention to performance at the compiler level. Have you timed your rectangular arrays in C# lately? Or, to be inclusive, have you looked at what’s (not) hoisted out of loops in Java?

The existence of the Iron* languages from Microsoft stems from Jim Hugunin’s discovery that adding moderate smarts allowed dynamic languages to run fast on the CLR:

  1. Use native CLR constructs whenever possible. These constructs are all heavily optimized by the underlying runtime engine. Sometimes these constructs will have to be used creatively to properly match Python’s dynamic semantics.
  2. Use code generation to produce fast-paths for common cases. This can either be development-time code generation with Python scripts creating C# code or run-time code generation using System.Reflection.Emit to produce IL on the fly.
  3. Include fall-backs to dynamic implementations for less common cases or for cases where Python’s semantics requires a fully dynamic implementation.
  4. Finally, test the performance of every CLR construct used and find alternatives when performance is unacceptable. Type.InvokeMember is a great example a a function that must never be used in code that cares about performance.

That’s hardly the stuff of PhD theses (don’t misunderstand me: Hugunin’s paper, which actually said something important, is more valuable than 99% of CS theses).

The point, though, is that we are in a time of high tension between what is possible and what is practiced. This gives me hope that we might see true breakthroughs in programming languages. Fred Brooks spoke of a silver bullet defined as a “single development, in either technology or in management technique, that by itself promises even one order-of-magnitude improvement in productivity, in reliability, in simplicity.” [my emphasis]). I don’t believe in silver bullets, but I think there’s a possibility of shiny buckshot.

On the discouraging side, I think there are great difficulties to building such a system: the development of a shiny shotgun is, I think, the work of double-digit person-years. It’s work that’s too far over the horizon for VC funding, too pragmatic for grants, and too dependent on brilliant execution by a small, high-performance team for Open Source.

LOLCode: U R Obsoleted Ruby, Erlang, F#

image

Bash Ups

With the release to public beta of Popfly, Microsoft’s mashup editor, I’ll reiterate my theory that mashups are the UNIX shell of the Internet. The corollary is that we need a suite of command equivalents:

Command Mashup Alternative
cd, mkdir, rmdir facilities for manipulating “current URI”; REST principles, etc.
mailx messaging transformations and transports: mail, IM, SMS, twitter, etc.
man  ?
jobs, ps, kill, sleep, etc. facilities for multiple mashup control
ls spidering facilities / robust HTML parsing, etc. “Get-ChildItem” in all its polymorphic complexity.
who FOAF
finger, chfn blogging
cat, sed, sort, grep, wc, tail, etc. All sorts of facilities for transformation of source to sink

Right now, everyone’s concentrating on what output the mashup editors can produce or what the component manipulation looks like. I think the winner of the mashup evolution will be the one that provides the most flexible suite of components.

Supply & demand? Outsourcing backlash? CS grads average $53K out of school

[C]omputer-science grads saw their average starting salary offers grow by 4.5 percent last year alone. The new average salary for a job right out of college is now $53,051. That’s the highest amount this decade.

Starting salaries surge for computer science grads [Ars Technica]

Interesting. I still think that the future is mixed-at-best for United States programmers (with our relatively high cost), but at least for now there’s some good news.