On Chinese Rooms, Roman Numerals, and Neural Networks
Dec. 9, 2021, 10:46 a.m.
As an acquaintance of mine used to say: “all models are wrong.” All models, among them neural networks, are also just tools, albeit very useful, very powerful tools. They do not necessarily come to learn or say anything of significance about whatever question they are put to. We should resist the temptation to imbue them with significance or to misunderstand their utility.
John Searle advanced a famous argument in 1980 known as the “Chinese room argument.” In it, Searle imagines a computer program that can accept well-formed sequences of Chinese and output well-formed sequences of Chinese (by which he presumably meant Mandarin), such that this computer program would pass the Turing Test. He then further imagines that, furnished with the computer code, sufficient time and resources to produce the calculations himself, he could accept sequences of Chinese and calculate precisely what the machine would produce in the same circumstances. Both he and the machine could produce identical output, even though he does not know any Chinese. He reasons that in this circumstance, there is no difference between him and the machine, and thereby concludes that any such computer program could not be said to understand Chinese any more than he could (which is to say, not at all).
One could be forgiven when encountering the natural language generation model GPT-3 (and its inevitable, infernal successors) that one has stumbled upon a somewhat flighty conversation partner, but nevertheless one capable of understanding (at least until the subject turns to what happens when you drink bleach or how to get a table through a doorway.) GPT-3’s ability to simulate English (particularly, the way it does it) may have been beyond Searle’s imagination in 1980, but something about Searle’s argument still rings true. As impressive as GPT-3 is, my intuition is that there is quite a gap from the ability to produce a felicitous sequence of symbols, given some preceding sequence of symbols, to actual understanding, whatever that might be. There can be no doubt that, given a sufficiently large corpus of an as-yet-undecipered language like Etruscan, for example, GPT-3 would be able to produce felicitous sequences of Etruscan. But only the rankest hubris would lead us, following that, to announce to the world that it is the first speaker of the Etruscan language in several millennia.
Neural networks like GPT-3 are a broad class of models, whose applicability to an equally broad number of tasks makes them attractive options when the problem is open-ended, enormous, and complex. Hence their wide adoption in NLP and object recognition, among other more speculative uses. In a sense, when the problem space is so huge it’s difficult even to define, and the practical problems of networks like the massive amounts of training data and the expense of actually training them can be overcome, it’s at least not a bad idea to try using a network, if not a good idea.
And this is where I want to come back to Searle. By using language in his example, Searle was picking a complex behavior that is inarguably a behavior that, among humans, requires understanding. Natural language production is an intelligent behavior that compresses meaning, logic, intentionality and external context (inter alia) into a serial acoustic stream, encoded into a decipherable format by a generative process shared (when successful) by speaker and listener. Pointedly, this generative process is not known. Linguists, psychologists, (and even the occasional philosopher, or worse, the occasional physicist) have come up with hypotheses of how humans generate novel, well-formed expressions of natural language, but at best these hypotheses result in an incomplete picture.
Seen in this light, one might rightly muse that a neural network is unlikely to tell us anything of significance about the generative process of natural language. This musing, while provocative, clearly does not follow. Our inability to discover how language works is entirely unrelated to whether or not a network could somehow represent all the intricacies of Cantonese, or Welsh, or Tagalog etc. The result might be inscrutable for human purposes, but a complete representation by a network is perfectly feasible, if unlikely.
However, is-generated-by is not the same thing as can-be-modeled-by. Take, for example, Roman numerals: the rules for representing a Roman numeral are fairly simple, as I discussed in my last blog. But just because we can generate all Roman numerals with a model does not mean that that literally is (a transformation or formal equivalent of) the process that actually generates Roman numerals. We could equally well represent the process of generating Roman numerals algorithmically, or even as a lookup table since there is a finite number of Roman numerals.
Manifestly, there is therefore nothing of theoretical significance that can be gleaned from inspecting the hidden state of a model trained to translate Arabic numerals into Roman numerals, because all of the rules for generating well-formed numerals are already known. If, in the case of a generative process which is not known, like Mandarin for Searle, we think there is some theoretical import or understanding conferred by the hidden states, we must also believe that that is the case for something like Roman numerals. At best, in the case of a generative process which is known, we can learn nothing beyond what we already know. And at worst, we can think we have learned something from a network which turns out to be false.
Even in the best of cases, then, there is every reason to be skeptical that the hidden states have any kind of theoretical significance. Imagine we could map a network’s hidden state to the Roman numeral formation rules — what then? The network is still a function. That it is a function is the source of all its utility and also some of its flaws. How would you encode within the network that Roman numerals conclude at 3,999,999? I wouldn’t say it’s impossible, and certainly straightforward if done post-hoc, but that would be injecting top-down information into the system artificially.
What’s more, as a function, the network has learned associations between Arabic numerals and Roman numerals, and as such it will generate output forever so long as the input is representable. One can feed the network sequences which are not well-formed, like “012345”, and still get output; in the case of the model I trained as part of the last blog post, the output is CMCCCXLV, which is not a well-formed Roman numeral. Even though it is incorrect (no answer would be), is it sensible? Does that even matter? If the network understood Roman numerals, it would protest at its input. Even though I am being deliberately facetious here, all of this is to emphasize the point that, if we look under the hood of a network expecting to see some kind of understanding, we will frequently be disappointed, even in the best of cases where we know that there is some underlying logic to the system.
So, as with GPT-3’s dietary recommendations, something is yet missing. Could we call that thing understanding? I won’t be so bold — especially when all the behaviors just described are straightforward properties of networks, being functions. What’s more, especially for large, unknown problems, there may be things to be learned from the state of a model. I’m not trying to dismiss that possibility, I am only trying to emphasize that networks (and models more broadly) don’t possess special powers of understanding, and they shouldn’t need to. They are powerful tools, to be used particularly when the generative process is not known or not straightforwardly replicated algorithmically (or with a simpler model), and when certain practical conditions are met.
There is a tendency in the dialogue about “AI” these days (often referring to complicated and enormous implementations of neural networks) to mystify, or even venerate, these models. (Oh, and don’t forget, also to prophesy doom). A model acts like a function. It learns patterns, and produces outputs. Maybe its internal state contains something of theoretical significance, maybe even something that emulates or mimics understanding. To Searle’s point, though, even in the best of cases, it’s only an approximation. Models mimic, sometimes with astonishing results, the generative processes that occur or exist in the real world, but fundamentally this is only a sophisticated form of mimicry or inference. As an acquaintance of mine used to say: “all models are wrong.” All models are also just tools, albeit very useful, very powerful tools. They can detect cancer, but also get confused by elephants. Let’s resist the temptation to misunderstand their utility and power.(Thanks to Matt Mahowald, who maintains an excellent deep learning focused blog, for his comments on this post)