Five, No Six Things to Know About ChatGPT
ChatGPT and Large Language Models are not a path forward for AI, or us.
Hello everyone and thank you again for your support of Colligo. I very much appreciate it. This post summarizes a talk I gave yesterday to science writer John Horgan’s wonderful class at the Stevens Institute of Technology.
Here’s what I think you should know, if you don’t already.
The success of ChatGPT is proving the old adage about AI, that problems once solved become uninteresting. When AI couldn’t play checkers, checkers was an interesting problem. Ditto for chess, then for Go. AI enthusiasts have pointed out for years, with some justification, that they’re “damned if they do, and damned if they don’t,” because AI successes somehow bring dismissal: “Well, that’s not really a problem requiring genuine intelligence anyway.” Fair enough.
But the complaint misses the point. The real point here is that our ability to engineer new tools and our ability to engineer new minds are entirely separate. Further, the success of AI is invariably a story of the success of building a more powerful tool. This suggests our sci-fi view of the field is wrongheaded, and always has been. AI isn’t in the business of creating a mind with “real” or general intelligence in the first place. Ironically, ChatGPT helps make this case.
It’s easiest to put it this way. Imagine telling someone in 1990 about how uncannily well a future search engine called “Google” will work, by (say) the year 2005. The capability you describe would seem like magic to the 1990s person, who would no doubt be tempted to assume that early the next century we will have finally succeeded in creating a mind, a new kind of entity sharing life with us on planet Earth—"a creature,” to use Jaron Lanier’s memorable term from his New Yorker piece in April this year. Almost no one in 2023 thinks the Google search engine is a creature, so it’s obvious that our ideas about tools and minds are just different ideas. Same goes with ChatGPT (if that technology is a “creature” or a “mind,” then I don’t understand English).
Yes, critics like
(and others) are right. It lacks a causal, semantic model of the world. (And yes, it needs one to avoid going insane every now and then.) A “causal, semantic model of the world” is often called a “world model,” a conceptual model of how things fit together and interact, using commonsense or background knowledge—feathers don’t break when dropped but champagne glasses might, and so on. We rely on a world model to perform inference from observed effects to plausible causes (abductive inference), and we use background knowledge perpetually, not only not only in conversation but importantly, in making our way in the world. It would be impossible to get from your couch to the supermarket without having access to a rich world model to continually reason over and consult.ChatGPT, GPT-4, LLMs generally don’t have one. Much ado has been made about them simulating one given statistical correlation (what’s known as inductive inference, broadly speaking), but correlation isn’t causation, and the simulations aren’t very convincing anyway. To put it bluntly, GPT-4 can’t reason.
To be sure, there’s much confusion about this point, even among scientists and engineers. Published findings of failures of the model are often challenged when someone gets a supposedly unworkable example to work. Prompt manipulation—basically, rolling the dice with how you ask ChatGPT a question until you get the answer you’re looking for—is one frequent factor in variance of results, and the technology itself will give different answers on different occasions by design (it’s probabilistic), so it’s difficult to nail down what can and can’t be done. But basic failures of logical deduction have been well-documented, and make a compelling case that there’s no “there, there.” There’s no world model to consult, so you get what the statistics gives you, from example to example. (Good luck using this technology for something mission critical, like autonomous navigation or decision making when lives or money are on the line.)
Along with basic failures of logical deduction, LLMs like GPT-4 also don’t understand causation (this point follows from the observation about no world model, but it’s worth expounding on). LLMs are particularly keen to invent causes where there aren’t any, as demonstrated testing GPT-4 on a popular task known as “Event Causality Identification.”
In an illuminating study, researchers found that GPT-4 was good at interpreting the meaning of causal events (throwing a rock, breaking a window), but poor at reasoning with causes. In particular—and troublingly—LLMs like GPT-4 tend to identify causes correctly when they’re there (the events mentioned in a textual passage are causally connected), but are prone to invent causes to connect events that are just correlated. In other words, ChatGPT likes to find causes where there aren’t any.
Like someone with a right hemisphere brain lesion, too, LLMs are great confabulators and bullshitters. Given non-existent causes, an entire defensive narrative might get spun out by an LLM, happily explaining to the hapless human questioner how non-causal events are in fact causally connected. True, failures like this don’t happen all the time, which is why I keep insisting that ChatGPT still is a legitimate innovation in natural language processing. But the purely statistical or inductive nature of the system means, just as I wrote in my book in 2021, that certain blinds spots are inevitable, and ineliminable. In other words, the systems work quite well, until they don’t.
Fixing discovered errors involves trade-offs in system performance. This is an enormously important point. Researchers in the same evaluation of causal reasoning capabilities of GPT-4 and other LLMs found that using methods like Reinforcement Learning with Human Feedback (RLHF) to fix non-causal hallucinations often exacerbated problems with the models’ causal reasoning powers. In particular, the systems in some cases became more apt to discover imaginary causes in non-causal event mentions. This is enormously important because it suggests that there’s no way to “master fix” the models of all the various weirdities and snafus that arise. It’s like a fast and fun boat that keeps springing leaks, but patching one tends to make another. At some point, you realize (you hope, not in the middle of the ocean) that, for all the fast and fun, the boat itself needs a different design. Too much ad hoc fixing means there are more basic problems.
The models keep us stuck in the past. Reliance on prior observation—historical data like web pages and digitized books—means the systems have a data cutoff, a date where the training data stops. In GPT-4, the cutoff date for training data was September 2021 (there are conflicting reports here, with some literature claiming January 2022). The model doesn’t know anything about what’s happening now, so it’s useless for real-time reasoning. For all intents and purposes, it thinks the world ended two years ago. Fantastic. It’s amazing to me why the media, businesses, and everyone else don’t talk about this more, and point out what a limitation it is for a putative state-of-the-art AI system.
Training GPT-4 took 25,000 GPUs—or about 3,125 servers—and 90 to 100 days of continuous runtime. This means that, if you want to “fix the boat” by training a new model, you’ll have to get your hands on a few billion dollars and a few thousand GPU servers, and wait over three months while you watch the electric bill grow by orders of magnitude. And anyway, given the lack of a world model, the fix will make some things better, perhaps, but won’t touch the underlying frailty of the approach. Is this really a path to AGI? To superintelligence? I think not.
Actually, let me say a few more things. I think the issue with Microsoft's Copilot (basically ChatGPT-3 and now 4, with custom tweaks), or just generally hooking ChatGPT up to the internet is what you might call a "shrinking value" problem. How is giving summaries of web pages from a search engine result a "revolutionary" technology? It's more impressive, at least in my view, interacting with it off the Internet, like talking to some too polite co-worker, who seems to know everything about everything. So if the business case here is augmenting search engines..... I don't get the "revolutionary" lingo. The business case for 365 sounds reasonable--if it works--, where the model can summarize meetings and draft initial Word docs and so on. We'll see how it works, and someone should pay attention to whether the office experience gets even more drone-like. I've been complaining of a growing conformity, this may be another step in that direction.
The world model point as you mention in your comment is extraordinarily important. At any given time, for any arbitrary query, you might get something wacky, or entirely convincing but just untrue. There are literally thousands of examples of this on the Internet, from making up historical events to insisting on illogical conclusions, on and on. So in effect, we need ANOTHER grounded world model to ensure that the ChatGPT results comport with truth. That's not revolutionary either.
All this said, I occasionally use ChatGPT to help remind me of things "what are the best books discussing the web circa 2000s?" and so on. Thanks again, Roberto, you got me thinking.
Hi Roberto,
Ahhh, yes, touche. I should have known this as I occasionally use it with Bing. Didn't make the connection thanks for pointing it out. I asked Bing's Copilot to explain the problems with it browsing the internet, and got this:
"
According to my web search, ChatGPT can now browse the internet again after being disabled for months 1. However, there are some limitations to this feature. ChatGPT can access the internet through Microsoft’s Bing web browser, but it is only available to ChatGPT Plus and Enterprise subscribers 2. Additionally, users have reported that the feature is not working as expected and that they are receiving stock answers when they ask ChatGPT to look up a web page or do anything internet-related 34.
"
I don't know really know I have an angle on this yet, in terms of writing about it. But thank you for the factual correction. Always appreciated! Best, Erik