I remember a time when “learning about computers” invariably started with the phrase “computers only operate on 0s and 1s…” Things could vary a little for a few minutes, but then you’d get to the meat of things: Boolean logic. “All computer programs are formed from these ‘logic gates’…”
I remember a poster that illustrated Boolean logic in terms of punching. A circuit consisted of a bunch of mechanical fists, an “AND” gate propagated the punch when both its input were punched, an “OR” required only one input punch, etc. At the bottom were some complex circuits and the ominous question: “Are you going to be punched?” Because Boston. (The answer was “Yes. You are going to be punched.”)
Anyway, the point is that while there was a fundamental truth to what I was being told, it was not overwhelmingly relevant to the opportunities that were blossoming, back then at the dawn of the personal computer revolution. Yes, it’s important to eventually understand gates and circuits and transistors and yes, there’s a truth that “this is all computers do,” but that understanding was not immediately necessary to get cool results, such as endlessly printing “Help, I am caught in a program loop!” or playing Nim or Hammurabi. Those things required simply typing in a page or two of BASIC code.
Transcription being what it is, you’d make mistakes and curiosity being what it is, you’d mess around to see what you could alter to customize the game, and then your ambition would slowly grow and only then would you start to benefit from understanding the foundations on which you were building.
Which brings us to deep learning.
You have undoubtedly noticed the rising tide of AI-related news involving “deep neural nets.” Speech synthesis, Deep Dream’s hallucinogenic dog-slugs, and perhaps most impressively AlphaGo’s success against the 9-dan Lee Sedol. Unlike robotics and autonomous vehicles and the like, this is purely software-based: this is our territory.
But “learning about deep learning” invariably starts with phrases involving the phrases “regression,” “linearly inseparable,” and “gradient descent.” It gets math-y pretty quickly.
Now, just as “it’s all just 0s and 1s” is both true but not immediately necessary, “it’s all just weights and transfer functions,” is something for which_eventually_ you will want to have an intuition. But the breakthroughs in recent years have not come about so much because of advances at this foundational level, but rather from a dramatic increase in sophistication about how neural networks are “shaped.”
Not long ago, the most common structure for an artificial neural network was an input layer with a number of neural “nodes” equal to the number of inputs, an output layer with a node per output value, and a single intermediate layer. The “deep” in “deep learning” is nothing more than networks that have more than a single intermediate layer!
Another major area of advancement is approaches that are more complex than “an input node equal to the number of inputs.” Recurrence, convolution, attention… all of these terms relate to this idea of the “shape” of the neural net and the manner in which inputs and intermediate terms are handled.
… snip descent into rabbit-hole …
The Keras library allows you to work at this higher level of abstraction, while running on top of either Theano or TensorFlow, lower-level libraries that provide high-performance implementations of the math-y stuff. This is a Keras description of a neural network that can solve the XOR logic gate. (“You will get punched if one, but not both of the input faces gets punched.”)
import numpy as np
from keras.models import Sequential
from keras.layers.core import Activation, Dense
from keras.optimizers import SGD
X = np.zeros((4, 2), dtype=’uint8′)
y = np.zeros(4, dtype=’uint8′)
X = [0, 0]
y = 0
X = [0, 1]
y = 1
X = [1, 0]
y = 1
X = [1, 1]
y = 0
model = Sequential()
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss=’mean_squared_error’, optimizer=sgd, class_mode="binary")
history = model.fit(X, y, nb_epoch=10000, batch_size=4, show_accuracy=True, verbose=0)
I’m not claiming that this should be crystal clear to a newcomer, but I do contend that it’s pretty dang approachable. If you wanted to produce a different logic gate, you could certainly figure out what lines to change. If someone told you “The ReLu activation function is used more often than sigmoid nowadays,” your most likely ‘let me see if this works’ would, in fact, work (as long as you guessed you should stick with lowercase).
For historical reasons, solving XOR is pretty much the “Hello, World!” of neural nets. It can be done with relatively little code in any neural network library and can be done in a few dozen lines of mainstream programming languages (my first published article was a neural network in about 100 lines of C++. That was… a long time ago…).
But Keras is not at all restricted to toy problems. Not at all. Check this out. Or this. Keras provides the appropriate abstraction level for everything from introductory to research-level explorations.
Now, is it necessary for workaday developers to become familiar with deep learning? I think the honest answer to that is “not yet.” There’s still a very large gap between “what neural nets do well” and “what use-cases are the average developer being asked to addressed?”
But I think that may change in a surprisingly short amount of time. In broad terms, what artificial neural nets do is recognize patterns in noisy signals. If you have a super-clean signal, traditional programming with those binary gates works. More importantly, lots of problems don’t seem easily cast into “recognizing a pattern in a signal.” But part of what’s happening in the field of deep learning is very rapid development of techniques and patterns for re-casting problems in just this way. So-called “sequence-to-sequence” problems such as language translation are beginning to rapidly fall to the surprisingly effective techniques of deep learning.
… snip descent into rabbit-hole …
Lots of problems and sub-problems can be described in terms of “sequence-to-sequence.” The synergy between memory, attention, and sequence-to-sequence — all areas of rapid advancement — is tipping-point stuff. This is the stuff of which symbolic processing is made. When that happens, we’re talking about real “artificial intelligence.” Artifical intelligence, yes, but not, I think, human-level cognition. I strongly suspect that human-level, general-purpose AI will have a trajectory similar to medicine based on genetics: more complex and messy and tangled to be cracked with a single breakthrough.