Large Language Models and the Chinese Room

The Chinese Room is a 1980 thought experiment from the philosopher John Searle. The Wikipedia summarizes the setup:

[S]uppose that artificial intelligence research has succeeded in constructing a computer that behaves as if it understands Chinese. It takes Chinese characters as input and, by following the instructions of a computer program, produces other Chinese characters, which it presents as output. Suppose, says Searle, that this computer performs its task so convincingly that it comfortably passes the Turing test: it convinces a human Chinese speaker that the program is itself a live Chinese speaker. To all of the questions that the person asks, it makes appropriate responses, such that any Chinese speaker would be convinced that they are talking to another Chinese-speaking human being.

Swell! But what if this was the means by which the computer operated:

  • The Chinese input (let’s call it “the prompt”) is written on paper and slipped under the door to a sealed room;

  • In the room, John Searle has a notebook in which every Chinese character is given an index number. He converts the sequence of characters in the prompt into a sequence of their corresponding numbers.

  • He has a large book, on each of whose pages is written a matrix of floating point numbers. He multiplies the numbers encoding the prompt by the matrix of the first page. He takes the resulting matrix and multiplies it by the matrix in the second page.

  • ... etc ...

  • When he comes to the last page, he has a sequence of numbers. He uses his “character-index” correspondence book to translate the result into Chinese characters, which he slips back out of the room.

Searle doesn’t understand Chinese. The books contain matrices of numbers and the operations on them are just mathematical operations. And while the realistic set of instructions is somewhat more complicated than “just keep doing matrix multiplications on the last result,” it’s really not that much more complicated.

Does the room have any understanding of Chinese?

The question then is: Where in the room can “understanding of Chinese” be said to exist? Searle just mechanically performed a bunch of math. The math operations work with any numbers. The numbers that specify the exact transformation are just ink marks on paper.

Searle didn’t specify that the actions inside the Chinese room were largely matrix multiplies. That, though, is how Large Language Models (LLMs) such as GPT-3 work. There is no comforting complexity of processing, no clear symbolic processing, no recursion or looping. The numbers embody a lot of processing, but even that processing is bereft of significant reasoning: it starts with statistics about co-occurrence of symbols and then is statistics about those statistics, statistics of those statistics and so on. There is no parsing of nouns and verbs, no subjects and objects, none of that grammar stuff. Presumably there’s kinda’ sorta’ something to do with that stuff implicit in weights that get multiplied together on their way to producing an output, but, boy, is it abstract.

I’ve never found the Chinese Room to be a particularly challenging experiment: I’ve always been on the side of what Wikipedia labels “the system reply” — it’s not important that Searle, the operator of the room, doesn’t understand, because the system as a whole does. The understanding / consciousness of the room resides, according to this argument, in the complexity of the processing instructions and intermediate results.

But I always thought that systems that began to approach believable conversations would have large amounts of symbolic processing and feedback. And while I can see past the lack of explicit symbolic processing, I find it impossible to ignore the lack of feedback — what Daniel Hofstadter called [Strange Loops]. In recent years I’ve become pretty sympathetic to [Integrated Information Theory]as a plausible account of consciousness, but the “atom” of IIT is feedback (the ability for integration of information to causally modify the output of the system).

By this model, the quality of the output generated by LLMs is just linguistic pareidolia — a projection of our imagination onto something close to random

> Sometime we see a cloud that’s dragonish,
> A vapour sometime like a bear or lion
> A towered citadel, a pendant rock, 
> A forked mountain, or blue promontory
> With trees upon it that nod unto the world
> And mocks our eyes with air…
> That which is now a horse, even with a thought 
> The rack dislimns, and makes it indistinct
> As water is in water