Hi everyone,
Here’s a break from the news cycle on Altman and OpenAI. This is a big-picture essay I had worked on for publication but never submitted. I put considerable effort into it, so I’m offering it to paid subscribers. I hope it helps make sense of the state of AI today. Thank you for your interest, and let’s help make the future.
Erik J. Larson
Artificial Intelligence (AI) systems are inferencing systems. That's not a particularly controversial point: inference is central to thinking. If AI performs the right types of inference, at the right time, on the right problem, we should view them as thinking machines. The problem is, AI currently performs the wrong type of inference, on problems selected precisely because this type of inference works well. I've called this "Big Data AI," because the problems AI currently solves can only be cracked if very large repositories of data are available to solve them. ChatGPT is no exception—it makes the point. In fact, it's a continuation of previous innovations of Big Data AI taken to an extreme. The AI scientist's dream of general intelligence, often referred to as Artificial General Intelligence (AGI), remains as elusive as ever.
Computer scientists who were not specifically trained on mathematical or philosophical logic probably don't think in terms of inference. Still, it pervades everything we do. In a nutshell, inference in the scientific sense is: given what I know already, and what I see or observe around me, what is proper to conclude? The conclusion is known as the inference, and for any cognitive system it's ubiquitous. For humans, inferring something is like a condition of being awake; we do it constantly, in conversation (what does she mean?), when walking down a street (do I turn here?), and indeed in having any thought where there's an implied question at all. If you try to pay attention to your thoughts for one day--one hour--you'll quickly discover you can't count the number of inferences your brain is making. Inference is cognitive intelligence. Cognitive intelligence is inference.
21st Century Innovations
In the last decade, the computer science community innovated rapidly, and dramatically. These innovations are genuine and important—make no mistake. In 2012, a team at the University of Toronto led by neural network guru Geoffrey Hinton roundly defeated all competitors at a popular photo recognition competition called ImageNet. The task was, of course, image recognition, from a dataset curated from fifteen million high resolution images on Flickr and representing twenty two thousand "classes," or in other words variety of photos (caterpillars, trees, cars, Terrier dogs, etc.). The system, dubbed AlexNet, after Hinton's graduate student Alex Krizhevsky, who largely developed it, used a souped-up version of an old technology: the Artificial Neural Network (ANN), or just "neural network." Neural networks were developed in rudimentary form in the 1950s, when AI had just begun, and had been gradually refined and improved over the decades, though were generally thought of little value for much of AI's history.
Moore's Law gave them a boost. As many know, Moore's Law isn't a law, but an observation made by Intel co-founder and CEO Gordon Moore in 1965: the number of transistors on a microchip double roughly every two years (the other part is that the cost of computers is also halved during that time). Neural networks are computationally expensive on very large datasets, and the catch-22 for many years was that very large datasets are the only datasets they work well on. By the 2010s the roughly accurate Moore's Law had made deep neural networks, known at that time as Convolutional Neural Networks (CNNs), computationally practical. CPUs were swapped for the more mathematically powerful GPUs—also used in computer games engines—and suddenly CNNs were not just an option, but the go-to technology for AI. Though all the competitors at ImageNet contests used some version of machine learning—a subfield of AI that is specifically inductive because it "learns" from prior examples or observations—the CNNs were found wholly superior, once the hardware was in place to support the gargantuan computational requirements.
The second major innovation occurred just two years later, when a well-known limitation to neural networks in general was solved or at least partially solved, that of "overfitting." Overfitting happens when the neural network fits to its training data, and doesn't adequately generalize to its unseen, or test data. Overfitting is bad; it means the system isn't really learning the underlying rule or pattern in the data. It's like someone memorizing the answers to the test without understanding, really, the questions. In 2014, again from Geoff Hinton and his team, a technique known as "dropout" helped solve the overfitting problem bedeviling early attempts at using neural networks for problems like image recognition (CNNs are also used for face recognition, machine translation between languages, autonomous navigation, and a host of other useful tasks). While the public consumed the latest smartphones and argued, flirted, and chatted away on myriad social networks and technologies, real innovations on an old AI technology were taking place, all made possible by the powerful combination of talented scientists and engineers, and increasingly powerful computing resources.
Black Boxes and Blind Inferences
There was a catch, however. Two catches. One, it takes quite an imaginative computer scientist to believe that the neural network knows what it's classifying or identifying. It's a bunch of math in the background, and relatively simple math at that: mostly "matrix multiplication," a technique learned by any undergraduate math student. (There are other mathematics in neural networks, but it's not string theory. It's the computation of the relatively simple math equations that counts, along with the overall design of the system.) Neural networks were performing cognitive feats while not really knowing they were performing anything at all.
This brings us to the second problem, which ended up spawning an entire field itself, known as "Explainable AI." Neural networks not only don't know what they're doing when they do it, they can't in general explain to their designers or users why they made such-and-such a decision. They're a black box, in other words, obstinately opaque to any attempts at a conceptual understanding of their decisions or inferences. With image recognition tasks like facial recognition, it means the network can't explain why they thought someone was a criminal (because he looks like a photo at the crime lab), or why the self-driving car decided to classify a bicyclist as a foreign and unproblematic object (this actually happened, in Phoenix, and the bicyclist was struck by the car and killed). The upshot here is that with neural networks we gained an immense tool for important tasks, but with a Faustian bargain. We generally don't count the systems as actually knowing anything (the point of AI), and even if they do, we can't ask them what, or why. We have a world of powerful, useful, but entirely opaque systems.
Back to inference. An even thornier problem confronts our current tip of the spear AI tech, neural networks. All machine learning, of which neural networks are a part, involves the provision of prior examples in order to learn. OpenAI and Microsoft have solved this “prior examples” problem with ChatGPT by cleverly linking it to Bing, a search engine. But there’s a confusion here, because the core large language models make use of search engine results but aren’t trained on them. The model is not “learning” constantly like a human mind, but applying a previously trained model to information culled from the web. (Enthusiasts suggest “in-context learning” solves this problem, but a closer look reveals its limits, as the trained model weights don’t get updated. It’s a bit technical for this discussion, but readers can read about it here and in many other technical sources on the web.) The difference between this technology and human brains is most obvious when considering, broadly, innovation—inventing something new, or coming up with a genuinely novel theory or idea. Training on prior examples means the knowledge available to the system is, in some very real sense, already discovered and written down. How can it come up with something—anything—new? The data dependency problem haunts ChatGPT like it does machine learning generally.
Though computer scientists typically don't refer to this data dependency problem as induction, that's what it is. And the problem, again, is that scientists, mathematicians, and philosophers have known for centuries that induction is not adequate, by itself, for true intelligence. It needs to be combined with other forms of inference, like deduction, and a much lesser known type referred to as abduction, or hypothesis generation. The former has a lineage as far back as Aristotle, who developed the syllogism: all people are mortal, Socrates is a person, therefore Socrates is mortal (known still by its original Latin name, modus ponens), and the latter is roughly causal inference, where we reason from an observed effect back to plausible causes. Since most of the world we see is linked causally somehow—the car didn't just stop, the brakes were applied, which generated hydraulic pressure, which travels to the brake caliper at the wheel—that's generally how we human minds perform inference. Neural networks don't have a clue about these other types of inference, so they can't possibly be on a path to general intelligence. We already know this, though, strangely, it's rarely if ever communicated to the broader public.
ChatGPT is Cool. Sort Of.
In 2017, quite an ingenious paper appeared, delivered to the Conference on Neural Information Systems Processing (NIPS) in Long Beach, California. The authors were a group of Google Research and Google Brain scientists, as well as (again) a scientist from the University of Toronto. The paper was really pure genius, and quietly at first, then very loudly it paved a path to what we know as Large Language Models—very large neural network systems that chug through massive amounts of text to generate new text. The innovation was called "self-attention" or just "the attention mechanism," and the details are a bit too hairy to delve into. But again, the 2010s had produced a real innovation for AI; only, still, it was for neural network systems, not anything fundamentally new. The attention mechanism described in that landmark paper made possible the new generation of language translation, text classification, text summarization, and chatbots or conversational AI we see and use today. They may even significantly improve web search, which Google has owned for essentially the entire century so far (Google has, predictably, launched its own version of ChatGPT called Bard).
The upshot of all this discussion is that on the surface AI has progressed by leaps and bounds but dig deeper and you see that it's actually stuck; the innovations for neural networks are laudable, but the broader vision of AI reaching AGI is dead on arrival. In fact, the entire 21st century can be read as bold innovations for a specific part of AI (machine learning), as well as stagnation on our journey to true intelligence. To take one obvious example, the focus on deep neural networks is why driverless cars, which were all the rage circa 2016, have largely disappeared from discussion today. It's one thing to misclassify an image or a face, or get an AI "hallucination" from a large language model, as when ChatGPT makes up a ridiculous or nonsensical answer because there's some weird gap in its training data--or, because it doesn't actually know what it's saying in the first place--and it's another when a fully autonomous vehicle weighing over a ton rams into a school bus, thinking it's an overpass, or kills a bicyclist, or thinks a speed limit sign is a stop sign. Self-driving car ballyhoo died out precisely because, as Elon Musk himself put it in 2021, "Generalized self-driving is a hard problem, as it requires solving a large part of real-world AI. Didn’t expect it to be so hard, but the difficulty is obvious in retrospect," adding, tellingly: "Nothing has more degrees of freedom than reality." There's the rub.
The question is what to do next. The answer, at one level, is obvious. As a community, computer scientists need to start thinking beyond further innovations for neural network systems. We've already done that. OpenAI, which made ChatGPT, released its last version of a large language model, GPT 4, with over a trillion parameters. It was trained on billions of words (tokens), amounting to gigabytes of text. This means, among other things, that just training a model of this size using the attention mechanism and the underlying neural network requires scores of GPU-equipped computers, outside the reach of scientists and engineers not hyper funded by venture capital. The pursuit of AI has become the sole domain of very rich institutions only. But innovations typically emerge from diverse places, historically universities and labs, or sole tinkerers with new ideas, or groups of scientists with no money yet but common pursuits and passions. This isn't the game we AI scientists are playing anymore.
Fortunately, some iconoclasts have begun speaking out, like Gary Marcus, formerly a cognitive science professor at NYU, Ernest Davis, who is also at NYU, and Hector Levesque at Toronto. They're all pointing out that neural networks aren't enough. And, encouragingly, even pioneers of neural networks like Yann LeCun, who is currently Chief AI Scientist at Meta, have begun admitting limitations. Last year, LeCun in a published interview conceded that the current approach isn't enough and later put out a much-discussed paper on OpenReview.net outlining a different approach (though it still uses neural networks). LeCun's complaint is with the lack of commonsense in current AI approaches, a lament that stretches far back into the annals of AI research. It's refreshing that these scientists are discussing limitations and speaking out. It gives hope that new and diverse ideas might start flowing into the field from myriad sources. And as LeCun correctly remarks, these innovations may finally give computers commonsense, no doubt by expanding the types of inference they can perform. That was the vision of progress in AI all along. Not "Big Data AI," but true AI. It's time now to get on with it.
I also observed that people often interchange the concepts of narrow AI and AGI without realising it, and their minds are biased towards AGI, even if they acknowledge that there is only narrow AI. This innate human tendency to be biased towards potentially better outcomes (technologically AGI is generally perceived to be more sophisticated and therefore better). This is being fuelled by and made worse by eg adopting anthropomorphises international AI definitions eg OECD Stuart Russell’s definition. AI system infers - rather than performs inference, and autonomous which even ISO standard acknowledges is a misnomer. Voila, mainstream media use these little hooks and spread the anthropomorphic AI. People are confused what ai is, what it can do...
Myths of AI is being institutionalised. A dangerous step further down the road leading to outcomes such as in the past, eg Giordano Bruno public execution as a heretic.