More on OpenID, FOAF, and Trackback

Dmitry Shecthman, who knows more about OpenID than I do, doesn’t get why OpenID is important to making FOAF the validation route for Trackback. Here’s my thinking, which has a 90% chance of being wrong (based on historical averages):

FOAF looks like this:

<foaf:Person>  <foaf:name>Leigh Dodds</foaf:name>  <foaf:firstName>Leigh</foaf:firstName>  <foaf:surname>Dodds</foaf:surname>  <foaf:mbox_sha1sum>71b88e951cb5f07518d69e5bb49a45100fbc3ca5</foaf:mbox_sha1sum>  <foaf:knows>   <foaf:Person>    <foaf:name>Dan Brickley</foaf:name>    <foaf:mbox_sha1sum>241021fb0e6289f92815fc210f9e9137262c252e</foaf:mbox_sha1sum>    <rdfs:seeAlso      rdf:resource="http://rdfweb.org/people/danbri/foaf.rdf"/>   </foaf:Person>  </foaf:knows> </foaf:Person> 

Which essentially says:

And one would expect this to be part of a file created by Leigh Dodd and sitting in his Website (perhaps at www.leighdodds.com/foaf.rdf) Given that Leigh created that file, one would think that Leigh would be willing to have his Trackback server automatically create links to Dan’s comments regarding Leigh’s blogposts (i.e., Dan is trusted by Leigh).

So, a Web of FOAF files (n.b. <rdfs:seeAlso>) defines a social network graph and part of my premise is that anyone within a few degrees of separation from me could be trusted to — oh I can’t resist  — “Foafback.”

So my first cut at a new Foafback software would be one that receives a Trackback post of this form:

POST http://www.example.com/foafback/5  Content-Type: application/x-www-form-urlencoded; charset=utf-8 title=Foo+Bar&url=http://www.bar.com/&excerpt=My+Excerpt&blog_name=Foo;postedBy=Dan+Brickley 

And looks up Dan Brickley in Leigh’s FOAF file and says “Oh yeah, Dan! Swell!” Except, of course, the value of postedBy can’t possibly be “Dan Brickley” or dbrickley@rdfweb.org or even Dan’s mbox_sha1sum because spammers are going to figure out the Websites of those doing Foafbacks to your site and they will easily guess any publicly available identifier of those wishing to perform Foafbacks.

Therefore, I think you need an arbiter of identity; you need a service that Leigh’s Foafback server and Dan’s Foafback pinger can use to silently-after-the-first-time validate the identity of the person doing the posting.

The second cut at a new Foafback server works like this: The first time that Dan trys to Foafback to Leigh’s site, he is redirected to Dan’s OpenID provider (I think that’s the term), logs in, and is told “www.leighdodds.com is requesting your email address” and Dan clicks “Okay, now and forever.”

Leigh’s Foafback server then receives OpenID credentials and a Foafback post (sans postedBy because the email of the person whose logged in is actually coming from the OpenID provider, not from the person performing the Trackback). Leigh’s Foafback server validates that the OpenID identity (i.e., Dan’s persona) is in the trust zone (i.e., can be reached via FOAF) and automatically generates a link.

So that’s why I think you need OpenID.

Now, since I went to the bother of showing what a Trackback post actually looks like, I guess I should state the obvious, which is that the onus of calculating the FOAF graph ought not to be on Leigh (the original blogger) but on Dan, the Foafbacker. The Foafback pinger needs to include the route by which the poster is asserting a relationship (a list of FOAF URIs ought to suffice). The Foafback server needs to verify that route (at least once, but I can well imagine the admin software saying “These people have tracked back to you; include them in your FOAF?”).

Spammers will subvert Overly Trusting Ted with second-order attacks (“Hey, love your site!” from “new friend” cutegirl15, whose FOAF is 10,000 phentermine sites) and there’s little that can be done about that. But the list of targets for the spammers real purpose (which is to get links to their phentermine sites posted on high-traffic blogs) is limited to those in Ted’s FOAF file. But of course Ted asserts that he knows Dave Winer, Robert Scoble, and Cory Doctorow, so the spammers have a target. But if the spammers link indiscriminantly to outbound links, they’re already at 3 degrees of separation (Ted-cutegirl15-phentermine) and, of course, Ted won’t validate (since Winer, Scoble, and Doctorow don’t have Ted in their FOAF files). So the spammers wise up and validate the route to the potential target by checking the potential target FOAFs. But by validating along the directed graph, this severely limits the speed by which spammers can propagate “out” from Ted’s trust zone (assuming that those in the top 1/2 of 1% of the blogging power curve don’t become superpropagators by allowing six-degree-of-separation Foafbacks).

FOAF, OpenID, and Trackback

Is a limited recursion through a FOAF graph based on OpenID the solution to Trackback? If that sentence isn’t understandable, don’t worry about it, but if it parses, continue…

The big problem, of course, is the initial trackback from those outside the limits of the graph. In such a case, the attempted trackback raises the barrier above which a bot can rise: you must have an OpenID and you must propose a path through the graph. Such trackbacks are submitted for moderation (who doesn’t check out those commenting on their posts? The A-Listers? Who gives a frack if this doesn’t work for them? As a person well up the power-curve of blogging (99.9th percentile), I can assure you that it’s not hard to read every mention that Technorati can find).  

OK, so the obvious failure mode is that Trusting Ted, who’s in my trustzone, allows into his zone a mole, who becomes a conduit for spammers. Several things occur to me about this: yeah, I have a blacklist in my trackback mechanism and it, too, is FOAF. Second, Trusting Ted FOAF probably has a distinctively low inbound:outbound ratio (again, the A-List bloggers love being supernodes, so they haven’t noticed that supernodes have downsides). Third, it seems to me that the graphs of spammer’s OpenID-based FOAFs would have characteristics: lots of transience, low connectivity to “real” FOAFs, non-power-law distributions (even if they developed mock supernodes, those would necessarily be transient), etc.

Given that the costs of any automated assault on such a system will approach zero, how is such a system vulnerable?