Please: Say “Twin” Networks, Not “Siamese” Networks

I’m a big fan of using a pair of identical networks to create output vectors that are then fed into a network that is trained to judge whether it’s two input vectors are the same or different.

This is a powerful technique for “low k-shot” comparisons. Most ML techniques that are trying to identify, say, “photos of _my_ cat,” require lots of examples of both the general category and perhaps dozens or hundreds of photos of my specific cat. But with twin networks, you train the discriminator portion to tell whether one input is from the same source as the other. Those two inputs are generated from two sub-networks that share the same weights.

Schematic of twin network

A twin network for low k-shot identification

Since training propagates backwards, for the discriminator to succeed, it needs “features that distinguish a particular cat.” And since the weights that generate those inputs are shared, training (if it’s successful) creates a “Features” sub-network that essentially extracts a “fingerprint useful for discriminating.”

To inference with a twin network, then, you hold your target data constant in one network and iterate over your exemplar dataset. For each example in examples, you get a “similarity to target” rating that you can use for final processing (perhaps with a “That looks like a new cat!” threshold or perhaps with a user-reviewed “Compare the target with these three similar cats” UX, etc.).

As I said, I’m a big fan of this technique and it’s what I’ve been using in my whaleshark identification project.

However, there’s one unfortunate thing about this technique, which is that it was labeled as a “Siamese network” back in the day. This is a reference to the term “Siamese twins,” which is an archaic and potentially offensive way to refer to conjoined twins.

It would be a shame if this powerful technique grew in popularity and carried with it an unfortunate label. “Twin networks” is just as descriptive and not problematic.

Posted in AI

Ricky Jay R.I.P

Ricky Jay was one of my heroes. I first became aware of him in the pages of the remarkable “Cards as Weapons,” an oversized paperback that I bought at age 13 because it had a few pictures of topless women in it (you really can’t appreciate how much the Internet has changed the adolescent male experience). But “Cards as Weapons” additionally laid out a path:

  • some straightforward guidance on technique (although unlike Jay, I gripped the cards along the short side, trading accuracy for spin)
  • tales of increasingly difficult and improbable tasks (splitting string cheese, penetrating a newspaper, sticking in to a watermelon)

  • and then, catnip to an adolescent in the 1970s, a Guinness World Record distance

It’s an axiom that all magicians are nerds: enthusiastic about a subject to a degree that overwhelms social decorum. One of Teller’s rules of magic is “make the secret more trouble than it seems worth.” Jay, who was one of the best close-up magicians in the world, was crystal clear about the obsession with which you had to practice the simplest of passes: thousands of hours, a lifetime of practice, a set of folding mirrors that you carried in your valise.

I could never drive myself to master palming a card or (to my great regret) walking a coin over the backs of my fingers, but Jay did give me permission to throw pack after pack of cards into trashcans, through the sports pages, and, while I never managed to stick a card into a watermelon skin, I eventually went wall-to-wall in our school field house (a distance, I am compelled to mention all these decades longer, 30’ greater than Jay’s Guinness World Record).

My obsession with throwing things shifted to Frisbee discs, and a complete accounting of that will have to wait for Volume III of my memoirs.

But Jay also modeled a different set of virtues, less spectacular but perhaps more useful to a young nerd. The magicians of the time came in two flavors: waist-coated or unicorn t-shirted. Either way they were flamboyant: the spectacle of magic called for dramatic gestures, plummy line readings, and a transparently pathetic demand to be the center of attention. Jay went a different route: a matter of fact affect bordering on subdued, a patter that took as its foundation true scholarship, and an invitation to look as closely as you wanted at the trick. If you admire the craft of David Blaine, you should watch some Ricky Jay routines to see some true polish.

I see Jay’s influence in another obsession that began around that time and which, unlike throwing cards or Frisbee’s, I still pursue: programming computers. Like close-up magic, software development is a task of unrelenting precision. A trick fails if the palmed card is even glimpsed, a program fails if a semicolon is misplaced or a count to a million is off by one. For professional programmers, the precision is a given. The scholarship is not. The self-effacement is not. There are many blowhards of software development who are missing only a cape and a tophat to complement their boasts of their tours of the courts of Europe and their mastery of hidden secrets.

A magician’s magician, he was apparently well-employed as a consultant in Hollywood and, to the extent that people would recognize him, I suppose they’d recognize his basset-faced visage as the craps dealer in “Deadwood” or from the movies of David Mamet, where Jay would deliver Mamet-like lines such as “Everything in life, the money’s in the rematch.” Jay played a craps dealer; he was the world’s foremost expert in dice.

Obsessive practice, scholarship, and a sardonic sense of humor : those were the elements to Jay’s success. Ricky Jay was not well known, but he was well admired.

The Simplest Deep Learning Program That Could Possibly Work

Once upon a time, when I, a C programmer, first learned Smalltalk, I remember lamenting to J.D. Hildebrand “I just don’t get it: where’s the main()?” Eventually I figured it out, but the lesson remained: Sometimes when learning a new paradigm, what you need isn’t a huge tutorial, it’s the simplest thing possible.

With that in mind, here is the simplest Keras neural net that does something “hard” (learning and solving XOR) :

import numpy as np
from keras.models import Sequential
from keras.layers.core import Activation, Dense
from keras.optimizers import SGD

# Allocate the input and output arrays
X = np.zeros((4, 2), dtype='uint8')
y = np.zeros(4, dtype='uint8')

# Training data X[i] -> Y[i]
X[0] = [0, 0]
y[0] = 0
X[1] = [0, 1]
y[1] = 1
X[2] = [1, 0]
y[2] = 1
X[3] = [1, 1]
y[3] = 0

# Create a 2 (inputs) : 2 (middle) : 1 (output) model, with sigmoid activation
model = Sequential()
model.add(Dense(2, input_dim=2))

# Train using stochastic gradient descent
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd)

# Run through the data `epochs` times
history =, y, epochs=10000, batch_size=4, verbose=0)

# Test the result (uses same X as used for training)
print (model.predict(X))

If you run this, there will be a startup time of several seconds while the libraries load and the model is built, and then you will start to see output from the call to fit. After the data has been run through 10,000 times, the model will then try to predict the output. As you’ll see, the neural network has learned the proper set of weights to solve the XOR logic gate.

Now draw the rest of the owl.

Posted in AI

“The Deuce” Stinks. A Rant.

I’m a hair’s-breadth away from declaring that “The Deuce,” HBO’s Sunday night “prestige drama” about flesh-peddling and pornos in Times Square and 42nd Street in the mid-70s, is an exercise in trolling, some kind of meta-level commentary on the lack of drama, characterization, or stakes in, y’know, pornos. It’s almost easier to believe that David Simon — the creator of “The Wire” FFS! — is engaged in some kind of multimillion-dollar performance art than that he’s presiding over a writing room as sloppy and listless (dare I say, “flaccid”?) as that churning out the scripts for this season.

The over-arching problem is that there’s no goddamn conflict: the characters just appear, smoke, have breakfast or a Dewar’s on the rocks, smoke, engage in the boring routine of their flesh-peddling, smoke, pour themselves another drink, and then we cut away to another character. I mean, my God, what was going on with James Franco and the dry-cleaning store he owned for two episodes? Why did we spend five minutes jazzing around in the JFK parking lot to establish “she’s going to LA alone because he’s scared of flying”?

Maggie Gyllenhaal says in one scene that her Little Red Riding Hood porno (are we supposed to gasp in wonder at the visionary genius?) will cost hundreds of thousands. Then the next time we see here she’s being offered $10K for 10% and a blowjob, which she gives, and what are we supposed to feel? “Boohoo, despite her ambitions she can’t escape the expectation that she’s a whore?” “Hooray, she’s doing what she has to do to realize her dream?” I dunno’. I don’t care. I mean, I could care if the writer’s decided to engage in character development rather than just moving on to the next damn thing.

I’ve watched something like 14 hours of this show on the strength of the writing talent of Simon and Pellecanos. That’s enough time to bring a lot of strands together, to get a lot of plots up to a rolling boil. But instead we’re halfway through the second season and goddam Lawrence Gilliard is walking in to a situation, taking it in, walking out, and then having the same damn conversation about how things never change. Yeah, you’re telling me, D’Angelo.

The first season was set in 1972, the “Walk On The Wild Side” pre-punk era, but the second jumped forward five years, to a 1977 that, contra reality, has Talking Heads, Elvis Costello, and The Damned as the soundtrack. (There were parts of NYC where that might have been the sound track, but they sure as hell weren’t mid-town discos.) One episode had a thirty-something musician quoting Rilke and Rimbaud and handing the plotless female bartender an album, which a sharp-eyed viewer can see was Jim Carroll’s “Catholic Boy,” released in 1980. Carroll you may remember from the song “Those Are People Who Died” but in 1977 the real Jim Carroll was not a musician but a poet struggling with heroin addiction. He didn’t start singing until he moved to California in 1978. All of which is trivial, but they’re the ones who decided to have this scene and we viewers are supposed to make sense of it and even if you know all about Jim Carroll, the scene is pointless. (And, by the way, Jim Carroll would have been a freakish red-haired beanpole friendly with everyone from Patti Smith to Keith Richards and would be an excellent character in a series about the bizarre confluence of high- and low- culture in mid-70s NYC which is a setting ripe for drama despite the evidence of “The Deuce.”)

Finally, and I understand that no one will ever get this far in this post, and it really neither supports nor refutes my thesis, but every time “The Deuce” opening credits end, and there is a shot of a building facade reflected in a puddle through which a foot walks, I get pissed off, because that’s a total rip-off of the closing shot in the opening credits for “Deadwood” and if there’s one thing that’s clear about “The Deuce” it’s that they don’t have David Milch writing for them.

Writing to Azure Storage With F#

This last weekend I participated in the “Hack for the Sea” hackathon. As part of that, I needed to store images and structured data to Azure Storage. The process is very straightforward using F#’s async capabilities.

First, you’ll need the connection string for your Azure Storage:

Use that to instantiate a CloudStorageAccount object:

let csa = CloudStorageAccount.Parse connectionString

Then, write method(s) to store the data in either Blob storage or Table storage:

// Put directly in Azure blob storage let photoSubmissionAsync (cloudStorageAccount : CloudStorageAccount) imageType (maybePhoto : IO.Stream option) imageName = async { match maybePhoto with | Some byteStream -> let containerName = "marinedebrispix" let ctb = cloudStorageAccount.CreateCloudBlobClient() let container = ctb.GetContainerReference containerName let blob = container.GetBlockBlobReference(imageName) //|> Async.AwaitTask blob.Properties.ContentType <- imageType do! blob.UploadFromStreamAsync(byteStream) |> Async.AwaitTask return true | None -> return false } // Put directly in Azure table storage let reportSubmissionAsync (cloudStorageAccount : CloudStorageAccount) report photoName = async { let ctc = cloudStorageAccount.CreateCloudTableClient() let table = ctc.GetTableReference("MarineDebris") let record = new ReportStorage(report) let insertOperation = record |> TableOperation.Insert let! tr = table.ExecuteAsync(insertOperation) |> Async.AwaitTask return tr.Etag |> Some }

The object passed to TableOperation.Insert must be a subclass of TableEntity:

type ReportStorage(report : Report) = inherit TableEntity( "MainPartition", report.Timestamp.ToString("o")) member val public Report = report |> toJson with get, set



Xamarin: You must explicitly unsubscribe from NSNotifications if IDisposable

In Xamarin, if you observe / subscribe to a particularly-named NSNotification in an object that is IDisposable (this includes any class descended from NSObject!), you MUST explicitly unsubscribe from it in your Dispose handler, or you will get a segfault (the system will attempt to call a method at a memory location that is no longer valid). The pattern looks like this:

class MyClass : NSObject
// instance variable
private NSObject notificationObservationHandle; 

         notificationObservationHandle = NSNotificationCenter.DefaultCenter.AddObserver(notificationName, NotificationHandler);

      void NotificationHandler(NSNotification notification)
         // ... etc ...

      private bool disposed = false;
      override protected void Dispose(bool disposing)
         if (!disposed)
               if (disposing)
               disposed = true;

Deep Whalesharks

Whalesharks are the largest fish in the ocean, but very little is known about their movements (where they breed, for instance, has been a huge mystery, although there’s now pretty good evidence that some, at least, breed in the Galapagos).

Whalesharks have a “fingerprint” in the form of distinct spots on their front half. The current state-of-the-art technique for ID’ing whalesharks from photos is a pretty brilliant appropriation of an algorithm for locating astrophotographs in the sky:

  1. Extract Points Of Interest (POIs) from your target image
  2. Draw the mesh of triangles created from those points
  3. Create a histogram of the interior angles of those triangles
  4. Use that histogram as a “fingerprint”

Visualization of steps in astrophotography fingerprinting

The basic insight is that it’s not the absolute location of the points-of-interest but their internal relationships that you can rely on. You can rotate the source image by any amount and the internal angles of the mesh stay constant. And because you’re binning, this algorithm is at least somewhat robust against noise (either false POIs or, more likely, missing faint POIs).

This is a great algorithm that is amazingly good with astrophotography. But the geometry of the night sky is constant — our constellations appear very much as they did to people thousands of years ago. Whether taken last night or last century, a photograph of Orion is going to Betelgeuse in one corner, Rigel in another, and a belt between them.

Two photos of the same whaleshark, though, will almost certainly be from different angles and distances. Another challenge is that the dappling of sunlight and shadowing from surface waves causes a lot of signal noise. So, today, whaleshark researchers have to do a lot of manual processing to identify an animal from a photograph.

My thought is to apply modern data-science and machine-learning approaches to identifying individual whalesharks. The main goal is really to create an efficient pipeline and not, so much, creating a better identification algorithm. (To be honest, I’ve already tried several “simple things that could possibly work” ML techniques and not gotten any traction, but I’m not giving up.)

I Didn’t Like “Enlightenment Now”

They say to never write a negative review of a book until it has received too many positive ones. Which brings us to “Enlightenment Now: The Case For Reason, Science, Humanism, and Progress,” by Steven Pinker.

The tl;dr is that he doesn’t actually argue this case, he just presents a bunch of under-reported optimistic curves and, in the face of problems that cannot be swept under the rug, assures us that if only we treat them as problems to be solved and not get depressed about them, all will be well. Hoo-rah!

If you say “Gee, that sounds like Pinker’s book ‘The Better Angels of Our Nature’, which was a good book!” I’d agree with you. If this book had been called “Even Better Angels of Our Nature” I’d have no problem with it. But Pinker’s “Case for Reason, etc.” is essentially “these curves happened, they correlate (kind of) with periods when ‘Enlightenment ideals’ were popular, therefore, Enlightenment ideals caused the curves!” That’s bad logic.

The only reason I’m criticizing this book is because I would love to engage a book that actually made the case for these ideals and wrestled with the question of why, while still broadly paid lip service to (the climate deniers don’t say “Science is wrong!” they claim that science is on their side), they seem to have lost traction in terms of driving societal action. Or, perhaps more in the vein of things Pinker likes to do, to discover that “no, history is always an ebb and flow and the tide of Enlightenment continues to roll in.” (I’d be happy to have that case made.)

Pinker wants us to believe that the curves of the book — global poverty, lifespan, wealth, etc. — are strongly predictive of future improvement and, over and over, frames the thought ‘But will that continue?’ as one of pessimism versus optimism. I am temperamentally an optimist, and can rationalize that (“Optimism gives you agency! Pessimism is demotivating!”). But Optimism bias is a cognitive mistake. The Enlightenment Ideal is to put aside optimism and pessimism and engage with the facts. Yes, it’s true that the Malthusians have been wrongly predicting “we’re just about to run out of capacity!” for 200 years, and “doom is unlikely” should be your starting point. But maybe humanity’s time on Earth is like that of an individual — ups and downs, and heartbreakingly limited, potentially with a long period of decline before the end. Hypochondriacs are consistently wrong, but in the end all of them can put “I told you so.” on their gravestone.

Beyond the problems of what the book engages in is what it just plain ignores. “The case for Enlightenment” is essentially a philosophical task and the proper balance of reason and passion have been discussed since (at least) the days of Plato and Aristotle. The word “Romanticism” only occurs twice in the book, in brief dismissals, and which is a worse reason to ignore it: not engaging with its explicitly anti-Enlightenment philosophy or deliberately ignoring it, knowing that many people happily identify themselves as romantics and might be less receptive of your position if it were posed as a choice?

“Enlightenment Now” isn’t a bad book. As “Even Better Angels of Our Nature” it’s fine. But ultimately it’s as shallow as a “pull yourselves up by your bootstraps!” self-help book.

fun-ny Faces : Face-based Augmented Reality with F# and the iPhone X

fun-ny Faces : Face-based Augmented Reality with F# and the iPhone X

Each year, the F# programming community creates an advent calendar of blog posts, coordinated by Sergey Tihon on his blog. This is my attempt to battle Impostor Syndrome and share something that might be of interest to the community, or at least amusing…

I was an Augmented Reality (AR) skeptic until I began experimenting with iOS 11’s ARKit framework. There’s something very compelling about seeing computer-generated imagery mapped into your physical space.

A feature of the iPhone X is the face-tracking sensors on the front side of the phone. While the primary use-case for these sensors is unlocking the phone, they additionally expose the facial geometry (2,304 triangles) to developers. This geometry can be used to create AR apps that place computer-generated geometry on top of the facial geometry at up to 60FPS.

Getting Started

In Visual Studio for Mac, choose “New solution…” and “Single-View App” for F#:

The resulting solution is a minimal iOS app, with an entry point defined in Main.fs, a UIApplicationDelegate in AppDelegate.fs, and a UIViewController in ViewController.fs. The iOS programming model is not only object-oriented but essentially a Smalltalk-style architecture, with a classic Model-View-Controller approach (complete with frustratingly little emphasis on the “Model” part) and a delegate-object pattern for customizing object life-cycles.

Although ARKit supports low-level access, by far the easiest way to program AR is to use an ARSCNView, which automatically handles the combination of camera and computer-generated imagery. The following code creates an ARSCNView, makes it full-screen (arsceneview.Frame ← this.View.Frame) and assigns it’s Delegate property to an instance of type ARDelegate (discussed later). When the view is about to appear, we specify that AR session should use an ARFaceTrackingConfiguration and that it should Run:

[<register ("ViewController")>]
type ViewController (handle:IntPtr) =
    inherit UIViewController (handle)

    let mutable arsceneview : ARSCNView = new ARSCNView()

    let ConfigureAR() = 
       let cfg = new ARFaceTrackingConfiguration()
       cfg.LightEstimationEnabled < - true

    override this.DidReceiveMemoryWarning () =
      base.DidReceiveMemoryWarning ()

    override this.ViewDidLoad () =
      base.ViewDidLoad ()

      match ARFaceTrackingConfiguration.IsSupported with
      | false -> raise < | new NotImplementedException() 
      | true -> 
        arsceneview.Frame < - this.View.Frame
        arsceneview.Delegate <- new ARDelegate (ARSCNFaceGeometry.CreateFaceGeometry(arsceneview.Device, false))
        //arsceneview.DebugOptions <- ARSCNDebugOptions.ShowFeaturePoints + ARSCNDebugOptions.ShowWorldOrigin

        this.View.AddSubview arsceneview

    override this.ViewWillAppear willAnimate = 
        base.ViewWillAppear willAnimate

        // Configure ARKit 
        let configuration = new ARFaceTrackingConfiguration()

        // This method is called subsequent to `ViewDidLoad` so we know arsceneview is instantiated
        arsceneview.Session.Run (configuration , ARSessionRunOptions.ResetTracking ||| ARSessionRunOptions.RemoveExistingAnchors)

Once the AR session is running, it adds, removes, and modifies ARSCNNode objects that bridge the 3D scene-graph architecture of iOS’s SceneKit with real-world imagery. As it does so, it calls various methods of the ARSCNViewDelegate class, which we subclass in the previously-mentioned ARDelegate class:

// Delegate object for AR: called on adding and updating nodes
type ARDelegate(faceGeometry : ARSCNFaceGeometry) =
   inherit ARSCNViewDelegate()

   // The geometry to overlay on top of the ARFaceAnchor (recognized face)
   let faceNode = new Mask(faceGeometry)

   override this.DidAddNode (renderer, node, anchor) = 
      match anchor <> null && anchor :? ARFaceAnchor with 
      | true -> node.AddChildNode faceNode
      | false -> ignore()   

   override this.DidUpdateNode (renderer, node, anchor) = 

      match anchor <> null && anchor :? ARFaceAnchor with 
      | true -> faceNode.Update (anchor :?> ARFaceAnchor)
      | false -> ignore()

As you can see in DidAddNode and DidUpdateNode, we’re only interested when an ARFaceAnchor is added or updated. (This would be a good place for an active pattern if things got more complex.) As it’s name implies, an ARFaceAnchor relates the AR subsystems’ belief of a face’s real-world location and geometry with SceneKit values.

The Mask class is the last piece of the puzzle. We define it as a subtype of SCNNode, which means that it can hold geometry, textures, have animations, and so forth. It’s passed an ARSCNFaceGeometry which was ultimately instantiated back in the ViewController (new ARDelegate (ARSCNFaceGeometry.CreateFaceGeometry(arsceneview.Device, false)). As the AR subsystem recognizes face movement and changes (blinking eyes, the mouth opening and closing, etc.), calls to ARDelegate.DidUpdateNode are passed to Mask.Update, which updates the geometry with the latest values from the camera and AR subsystem:

member this.Update(anchor : ARFaceAnchor) =
    let faceGeometry = this.Geometry :?> ARSCNFaceGeometry

    faceGeometry.Update anchor.Geometry

While SceneKit geometries can have multiple SCNMaterial objects and every SCNMaterial multiple SCNMaterialProperty values, we can make a simple red mask with :

let mat = geometry.FirstMaterial
mat.Diffuse.ContentColor <- UIColor.Red // Basic: single-color mask

Or we can engage in virtual soccer-hooligan face painting with mat.Diffuse.ContentImage ← UIImage.FromFile "fsharp512.png" :


The real opportunity here is undoubtedly for makeup, “face-swap,” and plastic surgery apps, but everyone also loves a superhero. The best mask in comics, I think, is that of Watchmen’s Rorschach, which presented ambiguous patterns matching the black-and-white morality of its wearer, Walter Kovacs.

We can set our face geometry’s material to an arbitrary SKScene SpriteKit animation with mat.Diffuse.ContentScene ← faceFun // Arbitrary SpriteKit scene.

I’ll admit that so far I have been stymied in my attempt to procedurally-generate a proper Rorschach mask. The closest I have gotten is a function that uses 3D Improved Perlin Noise that draws black if the texture is negative and white if positive. That looks like this:

Which is admittedly more Let That Be Your Last Battlefield than Watchmen.

Other things I’ve considered for face functions are: cellular automata, scrolling green code (you know, like the hackers in movies!), and the video feed from the back-facing camera. Ultimately though, all of that is just warm-up for the big challenge: deformation of the facial geometry mesh. If you get that working, I’d love to see the code!

All of my code is available on Github.