LLMs are not a Flawed Design, they are the Completion of a Flawed Paradigm
LLMs are the end game of machine learning, not AI
As you approach the end of something you can start to see its true limits, and as you see limits you can start to see what’s next.
Hi everyone,
A reader a while back suggested that LLMs are a “flawed design,” a quite reasonable comment. But it got me thinking. Here comes a quick note to be followed in due course by a more expansive version. I hope, even so, that this brief note captures the spirit of the reader’s remark and re-focuses it in a way that’s helpful.
So, I’ve been outed. I keep making “pro” LLM points and I’ve noticed I’m now roundly misunderstood—for or against?!!!!—judging by LinkedIn and a few Colligo replies. So let me do this quick and dirty:
What we are talking about is THE COMPLETION OF THE MACHINE LEARNING PROJECT itself. And here I must say: how amazing to have lived through this! We are now in the phase of watching AI technology perform finally. No more “Alexa, play….” as the tip of the spear.
The reason foundational model technology isn’t itself a flawed design is that it’s a logical progression from LSTMs, convolutional networks, back to HMMs and other limited window approaches, on back to the mid-20th century Perceptron. By THE STANDARDS OF MACHINE LEARNING performance, they’re a progression of designs that gave us more and more powerful classification and generative capability. Want to recognize squares versus circles, handwriting, now tens of thousands of photos on Flickr, now…? This is a clear cut performance discussion that everyone in the field accepted long ago (there would BE no field without it).
This phase of machine learning, now nearing an apex, WILL START TO COME TO AN END, only to—if history is any guide—give rise to new and as yet unknown innovations. Critique all you want. But get the bigger picture, too.
Not Heading To AGI
I could not have hoped for a better trajectory for machine learning than the end game of large language models. I wish I would have had this information when writing the Myth. We’re not close to AGI. We’re further away. But we couldn’t see it before, and so speculation and futurism ran wild. Superintelligence entered the lexicon (replacing ultraintelligence coined in the 1960s). We had crappy systems with canned responses in conversational AI, but they seemed to be getting better—or at least faster.
Now we have real conversational AI. This is an engineering feat that Alan Turing would no doubt take notice of, and we can see what his 1950s dream of talking to a computer really looks like. Amazing! That’s what we were all trying to do. But it’s different than we expected—and it’s much more a story of how machines get powerful but never mindlike. It’s wonderful to be at this point in the history of the field.
We are at the end of something, which means what comes next will be something unexpected and different, which makes this time in our lives extremely special. We can see now what machine learning CAN DO, and so we can start to glimpse in more empirical and less speculative terms WHAT IT CAN’T.
The trajectory of the innovation is nearly running its course.
What Is To Be Done?
Moving forward, here’s what I suggest. If you’re non-technical, focus on how we can use the technology to augment rather than replace human intelligence. I gave a podcast to a group of educators a while back, and their resounding message was simply that there’s no way in hell to get LLMs out of education (and this was many months ago). Their questions than quite reasonably involved how best to integrate them so the students learned more, rather than less, or rather than nothing at all.
If you’re technical: focus on errors. But don’t focus on the fact that they have errors, because frankly that’s stupid. All AI has errors. For that matter all engineered systems have an error rate. One out of every million shovels probably breaks and sends some poor unsuspecting fool to the ER with a gash in his foot. ALL ENGINEERED SYSTEMS HAVE ERRORS in even normal operation. True, the errors we see in LLMs are particularly stressful because they’re “cognitive.” But what is AI itself!!!! An attempt at cognitive performance. Much to discuss here but let me get this out.
So, what is the non-dumb thing to do? We need to address more the overall impact of LLMs given different domains or contexts. That will be different in legal, medical, educational, and other industries. It will be different in the military. Technical people should build up an entire science around rate, distribution, and impact of errors given some use or other, and then engineers and scientists should embed these flawed oracles in larger designs such that the errors are mitigated downstream, or somewhere maximally “epistemic.” People who build systems do this stuff. Don’t worry so much about the “LLM.” This takes me to:
LLMs ARE NOT A HOMUNCULUS. They exist in the context of huge man-machine systems, and even in the more limited case of the machine part, they are a component. We may find that some LLM offers a disastrously wrong answer X% of the time, but if the total system design is corrected, we get much less overall error (there are many strategies for doing this already, and I’m sure more to come). The LLM still sucks X% of the time. It’s the system it’s embedded in that doesn’t. When errors can have a catastrophic impact, as with aviation and many other industries and circumstances, systems are built not to eliminate but to handle errors.
On and on this goes. Saying the car crashed on social media is great, folks should know, but someone has to get the car back on the road so it can be driven more safely. Smart people build systems to solve problems, and in the world of AI, LLMs and other foundational models will continue playing a part.
If You’re a Billionaire
Billionaire companies like to think about core technology. With LLMs that means core training-based performance improvements—mostly, making larger models. If you have deep pockets and lots of GPUs, you can focus still on continued innovation like this, making the LLM itself more powerful (not the same as reducing errors if we mean actually training the model rather then fine tuning it or employing RAG). I believe we are reaching the limit of that type of innovation stemming from training. Too much to get into here, but worries about DATA WALLs and MODEL COLLAPSE are real. All that cash will eventually flow somewhere else. Watch.
The End of Machine Learning Progress? Yes.
My view is that the curve of “intelligence” for machine learning (ML) systems is close to the top, and they will soon plateau (and saturate). Plateaued technologies don’t go away, they slowly recede from view until they are either replaced or become so commonplace no one mentions them—like computers, or Internet search. Technology like LLMs, too, get absorbed into the society and the ecosystems where they are used. Eventually, over time, they get replaced—though as Nassim Taleb has pointed out, some technologies never die: spoons were around tens of thousands of years ago, and of course today. At any rate, it’s EXTREMELY rare to get rid of a technology before it’s replaced—there were once rousing discussions about getting rid of home appliances we now don’t notice. All the uproar over LLMs will give us safer use and better policies. But in the bigger picture, they’re an event horizon beyond which data-driven AI likely can’t go much further. That’s a big inflection point for what’s next, given the prominence of data ML for so long, and it makes me excited.
I’ll try to cover all this in more depth later, and much of it branches into separate discussions.
Erik J. Larson
Error detection and correction in this context is the question of determining whether a statement is truthful without resort to a true oracle (because if we had a true oracle it would be moot).
This is the hardest problem in logic and philosophy. It remains unsolved. We have systems, like the scientific method, to help us reduce our error rate, but they are slow - much too slow to apply in real time to LLMs - and not 100% reliable.
LLMs further confound our informal systems for detecting error: intuition about about the other's areas of expertise, linguistic and body language tells indicating that he's bullshitting or less confident in a statement. An LLM can be accidentally correct or incorrect about a claim on the *same subject* at different points *in the same conversation* and speak with equal confidence and equal degrees of competence signaling language in both instances.
This is less like a mythical oracle and more like a mythical demon, which might tell you the truth most of the time to gain your confidence and then lie strategically to sabotage you. Except of course the LLM has no such strategy, it's just sometimes wrong and sometimes right with no discernable pattern or frequency.
Anyway, cutting that rant short(ish), my point is: LLMs cannot be treated as flawed oracles. This is a terrible way to think of them and ML in general. We are not equipped to use them this way.
In some contexts it's probably fine, e.g. using computer vision or data/textual analysis to narrow down possible candidates for expert human inspection. Where false positives *and* false negatives are low consequence and every positive is independently analyzed by a qualified human.
But an unqualified/inexpert human is absolutely helpless in evaluating claims from any machine learning system. They *must not be presented as sources of truth*. They should not be presented as search engines, let alone knowledge bases. This will go badly and is going badly.
Here is a follow-on question to your idea that this is the end of fundamental innovation in machine learning. Is this also the end of fundamental innovation in automation technologies?
A lot of our civilizational problems and economic efficiencies are due to social factors that revolve around people's values and the qualities those values engender in human behavior and the build environment. Automation technologies can't really solve these problems. For example, you could built a fantastic infrastructure using driverless electric cars to transport people anywhere quicker and more efficiently than our current system, but you'd have to shred property rights, tear down a bunch of buildings in urban areas and substantially re-imagine land use across the board. Elon Musk isn't Robert Moses.
I would suggest that, perhaps, new innovations will come from a radically different intellectual framework than the underpinnings of machine learning. It won't be a flavor of computer science.