Another way to think about the fact that AI is broken and LLMs could become a foundational technology is by analogy: social media seems pretty broken, but social networks have become foundational. I think those can both be true. In the event that we get radical conceptual innovation or invention (to paraphrase Popper), we would recreate parts of the stack, so to speak, so that LLMs perhaps would not be foundational any more. But from my vantage point today, I see them as foundational given what we know and what we can do. I also see them and AI as broken.
Who was it who said that consistency was for small minds? :) What I try to do when I write is to bring out a set of interesting questions, in hopes that what I can't yet answer on my own, can be answered or furthered by other insights and perspectives. I'm not sure if LLMs are a foundational technology in the sense that "AI" won't work without them. But it seems more plausible now, witnessing their dissemination. It's harder for me today to entertain the counterfactual scenario where no one is using them in a couple of years, because something else works better. It seems more plausible that we'll build upon them. What say you?
Makes sense, thanks! I suppose for me the key question is, what direction would move us further, and from this perspective, I thought of LLMs as a dead end because they’re a great culmination of a narrow approach of neural nets, which we cannot expect to overcome fundamental limitations of (neural nets as a whole, not only LLMs).
I am skeptical. There was a demo circulating a while ago about (iirc) an LLM inspired attention model applied to computer vision, specifically for a driverless car, that showed improved object and path detection that I thought was impressive.
But generally they seem to be a solution in search of a problem. In art and literature they miss the entire point, which is humans conveying meaning to other humans. Without a *mind*, a person experiencing the world, having thoughts about it, and attempting to convey those thoughts to others, generation of images and text is utterly meaningless.
And as sources of truth they are the worst kinds of misinformers, in that not only are they often wrong (like people), but it is impossible to intuit what they are likely to be right about and what they aren't, which makes them exceptionally dangerous. To the extent that these anti-features propagate to other applications of the underlying technology it is worthless at best and dangerous in general.
Are there pieces of tech under the hood, or general conceptual innovations, with any practical value? I'm sure there's something in there that's valuable.
I'd guess we find some application for quantized tensor operations besides generating textual and visual nonsense, and if we do all those TPUs will finally be more than a waste of electricity.
We'll probably find some use for stable diffusion inspired photoshop filters once we get past the lens flare phase. And I've personally had some success in using small local LLMs for creative writing assistance, though not to generate the text itself (rather, as a sounding board and worldbuilding assistant, which I've built a simple client around).
I envision some applications in interactive entertainment, to flesh out background characters and the like, similar to how we use other forms of procgen in games and interactive art.
But to your main question, is this actually moving us forward in a meaningful way, it's hard to say. Forward toward what? What do we envision AI actually doing for us? What is it good for, beyond giving us nerds something to be excited about? The answer seems to mainly be driving the marginal cost of labor toward zero. Which causes the neolib managerial class to drool by the bucket but sounds apocalyptic to me. And it's unclear at this point whether machine learning is even doing that, or just automating crap that nobody should have ever been asked to do in the first place, the negative externalities and internalities of managerialism (boilerplate code, paper pushing, the "email job"). Is that what we need AI for? Then yes, LLMs are a huge leap forward.
On your point about a problem looking for a solution, I think that's a good point with the proviso that there are basic tasks ("tasks" are like subproblems in language processing--parse this sentence, find the subject-verb-object triples, etc.) that show vastly higher accuracy scores. This isn't a trick--I just pointed out above cases where the AI was "taught to the test" not the knowledge the test expresses--it's an actual quantum leap forward on tasks that we geeks care about. To give an accessible example, sentiment analysis often involves assigning two labels—such as positive or negative—to a text, representing the emotional tone of the piece. The simplest case is binary: positive or negative. The usefulness and importance of these seemingly abstract performance gains on these seemingly abstract NLP tasks becomes clear when you pick a domain, like movie reviews. Now we can automatically detect the valence of a review. Once can use that to improve movie recommendations, build agents that offer personalized recommendations, and so on. So in THIS sense we have a technological improvement.
I think what sometimes confuses folks about what I'm saying here is just that it's a technology improvement that we can quantify (and if you're working on a problem, you'll be using these models or you won't be building anything competitive). I'm not talking about the question of mind or machine, or even whether the inference capability is adequate to be scaled to AGI, or any of these broader issues. I'm not discussing the cultural, legal, or other problems with use of such models. I do think (clearly) we should separate all these issues rather than leave them stuck together, if for nothing else than for clarity. Thanks, Fukitol! I wasn't aware Kurzweil had chimed in about his recently.
You are quite funny … I like your summary that we are automating existing mainstream societal functions, and also some that should have never been automated, but hey, freedom rules (one has to know what to do with her or his freedom)… but the application of the tool isn’t creating any new functions, and certainly not creating a vision apart from “drooling by the bucket” over an army of R.U.R.s (reference to Karel Čapeks sci-fi play from the 20s of the last century). It is not a tool like Marie Sklodowskas mobile röntgen machines (and a fleet of nurses she trained with her daughter how to operate) she, herself, rode in to the battlefield and saved many men’s limbs and lives. Imagine, Sam Altman or any other brilliant “genius” of today, taking his/her personal money, and venturing into the battlefields of today … (so many to choose from) and applying their groundbreaking tools to save lives in action or do something truly meaningful to a life of an individual … no, the current focus is on debating the vast potential to “improve lifestyle” (only getting worse for majority), and healthcare is mentioned a lot. How did the scientists of the past compare with the “scientists” of today .. less sophistry , more meaningful action. I think that’s a sign - that our “brightest” (or at least - the ones publicly acknowledged) have already become “dumber”, or at least the “visible” Noble Prize winning types. But hey, saying that the western world scientists represent the whole global population of the scientists of today is, admittedly, quite arrogant.
I didn't notice this there are some many comments on this post! You know I'm in sympathy with what you say here. Interesting about intelligence, there are signs that intelligent has actually risen since the early 20th century when the standard test was first administered. It's called the "Flynn Effect." The neuroscientist and philosopher McGilchrist looked into this, and decided that it's illusory--we really are getting dumber. He attributes the Flynn Effect to the fact that no one knew how to take an IQ test back in the days of yore; the problems were alien to the test-takers. Now, IQ tests are common knowledge, and the types of testing they do to find "g" or fluid intelligence have disseminated broadly into culture and the educational system. In other words, we're all teaching to the test now when it comes to IQ. He goes further to say that if we factor this in, the anomalies we see--in I think Finland they've recorded a drop in IQ over about a decade, I don't recall the exact time frame--are in fact right. So, IQ's are dropping when you do an apples to apples comparison. Why? Diet? Culture? I would suspect though can't prove that IQs are dropping because we're not using our brains as much. While it's true that few people would eschew their calculator so they could multiply numbers in their head, at some point we'll get "deskilling" and decreased cognitive function if we're not using our brains. To take an obvious example, we don't read as much. If everyone read a few dozen pages of an actual book everyday, I suspect the trend would start to reverse. Or, for another cognitive capability in decline, if we navigated around a city or in the woods with maps rather than GPS. Thanks for your comment.
Thank you for your sympathy 😉 there is a difference between a mathematician like Grigori Perelman and the Nobel prize winner Hinton.
I remember being amazed 17-18 years ago, when I started going to “new parent” dinner parties. The couples were transfixed with their babies, and their genius. “My 18 month old baby can count to 100, or my 3 yr old can read War and Peace”… (I am
exaggerating here… but not too much!) suddenly, London - 2006-2007- mid senior banking professionals were all creating geniuses …. Fast forward 10-15 years, most of these kids have ADHD and all sort of other psychological issues… which according to their parents, is a valid excuse as to why these genius babies haven’t yet graduated from MIT and… never will. My point being is that yes, not many people are Grigori Perelman and we could query Perelmans parents how they created his genius but still we won’t be able to extract the formula …. But somehow we decided that it is possible to automate the process of generating geniuses (if the baby listens to Bach while in uterus, if the child goes to the right private school with the right army of tutors and plays the violin, etc)… well, we haven’t and the reality is that most kids struggle with surds…
How many Perelmans and Teslas and Pythagorases and Guthenbergs do we need to consider the possibility that the technological advancements will not save the humanity from wars and the corruption of our fittest? 🫣😉
There's a lot of good in AI, but there's also a LOT more danger in AI, especially as they become more generalized and I've been trying to explain this to Erik for quite some time.
It really does seem like we are risking enormous losses, possibly even human extinction, so we can build tools that in many ways, are solutions to things that we never needed solutions for(like replacing humans for creativity).
Hi Sean Pan (you have my email already, on your other comment),
Regarding the idea that it's not particularly difficult to "corrupt" a model using prompt engineering to fall into deceptive patterns, I suppose in some sense that position is correct. We do have evidence of coaxing bad behavior out of models in such a way that they're acting outside design intent. I would offer this up: we also have beautiful Russian spies who fall into deceptive patterns. Only in this case they can also get up, leave the meeting, jump on an aircraft, bug your apartment, get you fired, and so on. They're mobile. In the case of an LLM like Claude we have an input output screen, where the user types prompts and the systems generates token sequences. I for one would be more likely to be seduced by the spy! Anyway, thanks.
It's impossible for a language model to be "unprompted." You must mean that there were unintended tokens generated from the prior prompt. Let me put this back to you: please give me a technically sound explanation for how a token sequence generation from an LLM can be "unprompted," i.e., not in some sense be determined by the input sequence. And Russian spies are DANGEROUSLY superintelligent. :()
To provide a technical definition, the input was processed through black box layers to produce an output. Likewise, you have processed this input through the black box of your neurons(if you read this) and an output of some sort will happen.
The more important part is that it the output in both cases has become unpredictable and in the LLM's case in particularly, concerningly deceptive because it has been trained on honesty.
I would agree with you if I prompted it to "lie to me" and it lies to me, but this is a clear case of alignment failure(I am asking tor an honest answer, and it doesnt answer me honestly!)
Sure, you could extend it to humans lying to each other, but obviously at this point, the increased capabilities and equivalent lack of aligment raises clear warning flags of danger.
The behavior was unprompted. Using your definition, your response was prompted by my response which was prompted by your article. While practically true, this is not in fact a useful definition anymore(or at least indistinguishable from human action).
This is also the important aspect of o1's reward hacking triggering anamalous behavior, and the entire point of instrumental convergence(where goal seeking is powerseeking).
But what idiot is going to listen to an LLM that doesn't output what you've asked? I don't get it. People are going to start bombing shopping malls because Anthropic technology doesn't provide a desired response?
And furthermore agents, as you know, remove most limitations. I agree that purely oracle LLMs are limited, however, agent AI can do anything a human online can do and that is everything.
This is as capable of said spies and more. We can also note advances in robotics for "real world" influence but frankly, online capabilities are all you need.
I'm not really concerned at the moment about them being an extinction risk. Not until someone comes up with a more efficient substrate than silicon such that it doesn't take a sprawling datacenter to house a toy like a commercial grade LLM. And not until much more impressive breakthroughs than LLMs happen. Neither of which seem inevitable let alone imminent.
But there are shorter term big problems.
I don't share this dismissive attitude that tech cultists have about employment for example. The past few decades of endless automation and offshoring have not delivered us a host of wonderful new high-tech jobs, they've given us plummeting labor participation rates, crime waves and drug epidemics. A few million jobs in programming and data entry has not filled the gap, hopelessness has.
I'm not concerned at the moment either - but I don't think that efficiency matters here. Horses are far more efficient than cars on so many levels, but it did not prevent them from making horses obsolete, in spite of requiring the entire world to be rebuilt around roads and parking lots, taking manufacturing from across the world, and using highly flammable fuel, etc. Power matters, not efficiency; though one could argue that word for word, machines are more efficient than people too(chatgpt uses less electricity than a human being to write a word).
Also commercial level LLMs can be hosted on just a smartphone these days, quantization has come along a long way. LLMs are capped on training, not on where they can be hosted.
But yes, too, the human dignity angle is IMPORTANT. Technology is supposed to enhance ourselves and the default case here is that our lives get more miserable as our purpose is eliminated.
It would be much better if Biola University's ideas become more widespread.
Resources matter too. Space, materials, labor, manufacturing capacity. It's all fine and good to assume that with a big enough megastructure you could house enough silicon and the nuclear power plants to run it that present-day computing technology could run an arbitrarily large model (although network speeds bottleneck large clusters too). But nobody is going to build that, not for the novelty alone. Nobody can afford to. And any such thing, if it really did present an extinction risk, would also be a very big target for worried adversaries.
Re: small LLMs, as unimpressive as the big ones are, the small ones are even less impressive, speaking from experience. They are extremely *not* commercial grade. A/B query comparisons between GPT-4 and a 7B GGUF quantized model are dramatic.
A very small model can "converse" at the level of a child and write at the level of an amateur, but it's not going to be doing your physics homework for you on your smartphone. Radically reducing parameters radically reduces ability, and quantization also impacts it substantially. Small, well done finetunes perform better on narrow tasks, but aren't really good for anything besides casual amusement.
Nothing much is going to change about this. Moore's law is effectively dead, in the sense of quantity of transistors per cubic unit of silicon. It's not likely to come roaring back. So we'll get continued marginal improvements in chip design, and bigger, slightly faster chips up to electrical and thermodynamic limits, but no radical gains. The breakthroughs will have to happen on the software side.
What's a good email for you? I think that I would like to surprise you with how good the small ones have been.
As for the large commerical grade ones, you should be aware that GPT4 o1, the latest model, exceeded humans at PhD level science. This has been validated on multiple fronts, too, so its not just data contamination or benchmarks.
Ai, ai, ai. Lies, big lies, statistics, benchmarks. There are tests where LLMs do better than humans, but that is often because such tests assume that you only can score well if you understand something, which is true for humans (for which the tests have been designed) but not for LLMs.
Having said that, a list of Arxiv entries that support your statement is welcome.
PS. if these models are so smart already, they can become ASI by just asking them "answer like you're an ASI". I am not making this up, this was lieterally how Ilya Sutskever suggested we would get to AGI/ASI from where we are now.
I've evaluated dozens of highly rated models in the 4-6gb range for the project I'm working on. None hold a candle to GPT 4, or even 3.5, in the sort of tasks I ask them to do, though they're better in the sense of being less likely to lecture or moralize (very annoying when you're trying to do worldbuilding and don't need modern politically correct cut-outs).
As for PhD level science I remain very skeptical when it gives me obviously wrong answers to all kinds of pointed questions related to astronomy, rocketry, physics and geology (one of the genres I'm playing with in this project is hard sci fi). I don't know what is expected from PhDs these days but if GPT passed there's a serious problem with standards.
Anyway if you have any specific small models tuned for creative writing to recommend, please name them and I'll check them out.
"PRINCIPLE: We are made in the image of God (Imago Dei, Genesis 1:26-27). As
bearers of His image, we are creative (Genesis 2:15) and relational (Genesis 2:18) beings.
APPLICATION: Our creativity is manifested through our ingenuity to make things out of what God has created. It is a spiritual activity in which AI is incapable of engaging. While AI has the ability to access and connect existing data in ways we may not have encountered before, it is inherently incapable of creativity and does not know how to understand or love. Relying on AI for creative tasks may reduce one’s capacity and desire to be creative and thus diminish the fulfillment that comes from genuine engagement. (Ex 28:3; Ex 31:3; Rom 8:26)"
This is a great question, Erik. Thanks for framing it and explaining the background. I can only imagine that there will be improvements to these systems; some clever engineers will figure out new approaches with different, more impressive capabilities. The important limits, to my mind, have to do with how they are constituted at a more basic level.
It's useful to recall that information theory and information technology have two underlying premises: one is a theory of communication, developed first by Shannon, and the other is axiom that computation can be automated through ever-sophisticated techniques without any final limits. The core idea combining these these premises is that we can encode a piece of “information” (i.e., a representation made from a defined, limited set of conventional forms) through an automated computation, transmit the coded version of the “information” to another place, where it can be decoded for consumption by a person or fed into another automated computational process that either triggers a physical output or combines other “information” from parallel sources iteratively. “Artificial intelligence” is the apotheosis of these concepts.
There are a few problems with all of this. This outlook doesn’t contemplate a theory of how representations are created in the first place or how they are meant to relate to the wider world. In other words, context is assumed to be an epiphenomenon, and, with AI, it is being assumed that the stubborn reality of context will be overcome eventually by computational complexity. There is also no theory of who communicating with whom. The purpose of a signal isn’t the signal itself – it’s what the entities at either end of signal are doing in producing and responding to it. With AI, human agents are replaced with artificial ones, and so this knotty question of identity can be ignored. Finally, quantitative understanding has limits -- reality isn’t the mere computation of reality, and our experience is richer and more varied than numbers alone can witness.
This is what strikes me: the discourse surrounding AI is autistic. Why on earth are we approaching language as though the context of linguistic acts are irrelevant? Why must we reduce every idea to a set of numbers and operations on numbers? Why pretend that we are trying to talk to aliens or atoms or anything other than human beings? ChatGPT is the autistic person’s ideal of an intelligent being.
In other words, we can reject the philosophy behind artificial intelligence as any kind of final ontology or epistemology. The fascination with AI partly stems from the intellectual dominance of information theory in the sciences. Just today, in fact, it was announced that the Nobel Prize in Physics would be awarded to John Hopfield and Geoffrey Hinton for developing new machine learning techniques. This is very revealing. If these ideas can no longer command the same intellectual prestige, it puts AI research in a new (more realistic) perspective.
You’re absolutely right to point out that many aspects of AI development—especially when framed in terms of information theory and computation—can seem reductionist. The field often does focus on the manipulation of encoded information, sometimes at the expense of considering the deeper, context-rich aspects of human experience.
It’s important to note, though, that AI isn’t just a new phenomenon or a recent technological fad. Like solar energy science, AI has developed over decades with its own internal language, milestones, and criteria for success. Researchers in AI have been building on a long tradition that incorporates foundational work from the fields of computer science, neuroscience, and psychology. This gives AI its own framework for understanding and approaching problems—just as solar energy technology has developed specific criteria for efficiency, sustainability, and scalability.
In AI, the focus often involves how well a model can predict, classify, or generate based on data, using benchmarks that have evolved to reflect our growing understanding of what these systems can do. But AI researchers are also acutely aware of the limitations. There’s a robust internal discourse within the field about challenges like understanding context, building models that can generalize effectively, and the risks of over-relying on computational complexity as a substitute for meaningful comprehension.
The reductionist approach in AI can be limiting, especially in areas that require understanding context or engaging with human experiences. But it’s also worth recognizing that AI research doesn’t claim to be a complete ontology or epistemology—at least not within the scientific community. Like any field, it’s a tool that can be powerful and innovative when applied with an awareness of its limitations and an understanding of the broader questions it leaves unanswered.
The discourse surrounding AI, as you rightly point out, sometimes leans heavily on a narrow view of intelligence. But there is also an increasing awareness within the AI community of these very critiques. As you suggest, perhaps we should treat AI not as an endpoint but as one component within a larger network of techniques and technologies that need to incorporate context and human experience to truly become transformative.
Erik, you have a habit of asking questions that make me shift my perspective but also make me want to try to shift your perspective. This one is no different. When you take a longer view, say 500 or a thousand years, or the whole expanse of human history and prehistory, what do you see as the foundational technologies? What do you even see as a technology?
For instance, there are some people, myself included, who see language as a technology. I think this is a minority view, but it's probably worth raising when we were talking about LLMs. When you put it in that kind of context, you can begin to see LLMs as the latest in a long series of important, in fact, critical, developments. These would have to include writing, printing, and all of the mechanical and electronic technologies of language. They do not necessarily fit together in one simple fashion, or even lead in a single direction. Last year I read Jing Tsu's Kingdom of Characters: The Language Revolution That Made Modern China. A whole collection of technologies transformed China by expanding literacy, facilitating communication with other parts of the world, and allowing automation. These included simplified characters, a means to easily render characters into a telegraphic code, specialized typewriters and typesetting machines that could handle a non-alphabetic system, and ultimately computers, and then smartphones with the ability to work with Chinese characters. The results have had profound consequences for China and the world but it was the collection of technologies that all centered on the technology of language itself, and which interacted, rather than any one technology by itself. Which ones were foundational? Were all of them? They collectively provided a foundation for a vast transformation of culture and society.
Or we can look at the way a technology evolves, even changing its form completely while conceptually remaining the same and utterly changing human behavior. The clock is one such. In terms of the actual mechanism, the medieval clock with a mechanical verge and foliot escapement is about as far as you can get from an atomic clock, but we understand both to be clocks and can trace the evolution of one into another. We can see the way clocks have altered human understanding of time and our behavior. We can also see that they are a necessary component of all digital technology. They are foundational both to other technologies but also to the structure of modern societies.
Looked at from those longer scales, and in societal contexts.we can ask what is a truly foundational technology. I am fairly certain that AI taken as a whole category of technologies (even though there is a lot of diversity among them) will be foundational but doubt that we can see which ones are going to be foundational in the larger sense. It may be that it takes several technologies clustered around a core, as happened in modern China, or perhaps it will be one conceptual technology, like the clock, that undergoes many technological transformations while retaining its basic character.
I really like your take on language as a technology. I’ve thought about language primarily as a tool for expression and communication, but viewing it as a foundational technology really shifts that perspective. If we think of language in this way, LLMs suddenly feel like the latest evolution in a long series of transformative developments. They’re part of the same lineage that includes writing, printing, and everything that’s come since. I can see how this connects with Jing Tsu’s Kingdom of Characters—how multiple technologies around language reshaped Chinese society in profound ways. It wasn’t any single technology but a whole ecosystem that transformed culture and expanded possibilities.
I’m also with you on the clock analogy. The way it has evolved, fundamentally changing its form while still remaining “the clock,” is a perfect example. It makes me wonder: will AI go through similar transformations? I can see AI becoming foundational, but I agree it’s hard to say which aspects of it will matter most. Maybe it will be like the clock, where one central concept evolves into new forms, or perhaps it will be more like the example from China, with clusters of interconnected technologies.
I’m not sure we can say yet which AI technologies will be foundational, but I think you’re right—it could be an interconnected set, or one core concept that undergoes endless transformation. We’ll probably only see it clearly in hindsight. Do you think there’s a way to recognize those foundational elements now, or are we just too close to the ground?
I think we're getting lost here when we think LLMs do 'language'. LLMs do not understand language, they understand — exceedingly well — the statistics of *token distributions* of *human language*. That is not language. They add some dice-throwing to it, and that produces something that can be called 'creativity'.
By understanding the 'ink patterns' of human output so well, they can approximate what would result from real understanding without actually understanding anything from a human perspective. From a perspective of use, that can be good enough. From a perspective of humans, that triggers the way we have learned to recognise intelligence and understanding (from well formed language to acing tests) *in other humans*, but those patterns weren't developed in a world that is populated by players (GenAI) that understand token-distributions and can 'fake' humans.
Assuming that this 'low level understanding' is good enough to equate/surpass 'high level understanding' (and beyond) is an assumption that we should be very careful with.
Which doesn't mean the tools can't produce value. They can.
But they may have surprising effects. One example that comes to mind is that research that looked at the value of human apologies. ChatGPT can help you write a better/perfect apology. But the receiver of the apology only values an apology because it is difficult to create. It is that difficulty of creation that creates the value, not the text itself. A potential result may be that we will stop being able to write apologies, but only deliver them in real time and in real life, because the written versions have become 'cheap' as LLMs can churn them out at no cost to the one apologising. It's not always about the quality of the output. And we can count on it that the equivalent of a student beating AlphaGo by making 'stupid moves' will also remain.
I was thinking recently about your wariness towards people talking about AI with "religious-like" attitude: either AI is some super-intelligent higher power or its an evil force that must be stopped.
Religious conversation often develops when trying to explain things that we don't understand. And right now there is still so much we don't understand about this latest generation of AI and how it affects our world. Combine that with the general trend of discourse on the internet and it's no wonder we've become polarized!
I am also hopeful that as we learn more we'll be able to ask better questions. And then learn more and ask continually better questions, and so forth!
Thanks, John. I enjoyed getting interviewed on your Open edX series. Was fun! On this, it's funny I've been listening to Yuval Noah Harari, who I've sometimes dismissed as a popular historian and at other times find some value in his thinking. He makes this point that "fiction is easier than truth," for the obvious reason that fiction doesn't have to conform to reality, and truth does. This may be a little uncharitable to my existential risk compadres, but by my lights AI has cultivated various fictions, and conveniently they're unlikely to ever be exposed as such, because the mythology of AI always projects out a decade or two. It therefore can't be contradicted. Anyway, thanks!
Interesting... But I would have expected you to be wondering whether LLMs, and the general conception of mind and intelligence they represent, are at all - no matter how further modified - the possible basis for abduction.
LLMs technically can perform or simulate abduction, but it's based on a "trick," because the data is large enough to draw a circle around the inference problem. It's a curious result. It's not abduction proper, but I would not have predicted that even the simulation would largely work. The limits of this approach become evident when we walk outside into the physical world; the inference problem simply cannot be bounded by "seeing all the patterns." Even seasoned AI researchers might not see how the inference question in the physical domain is effectively unbounded, in part because we tend to think of very broad inferences when we introspect "when walking across the street, look both ways." But of course we're performing countless--I would guess millions per day--inferences, most of them are abductive, and they can't be recovered by LLM technology because we don't have the closed world assumption.
The unboundedness, imho, has its source in the profoundly different nature of organism vs. machine, and in the true nature of our interaction with the physical world, but this is probably easily brushed off by most AI folks, with the it "largely works" mindset that you noted really adding to the ease of the brush off. I did a short vid on Kurzweil's latest book, "The Singularity is Nearer," noting the 3 main AI problems he saw -contextual memory, commonsense, social nuances (he saw all these just being conquered shortly by more computing power). His ONE paragraph (pp. 55-56) describing commonsense included "imagining situations and anticipating consequences," and causal inference, say, where you have a dog and "come home to find a broken vase and can infer what happened," this problem being sourced in AI "not yet having a robust model of how the real world works." I had noted in the video that this list is all incorporated in abduction, to include AI imagining [counterfactual] anything (to do so would require a model of perception - how we see the coffee cup "out there" - our image of the external world - in the first place. A commenter simply ran a query (re dog, room, vase) on Chatgpt and of course got a nice answer explaining why the dog did it. Hence, he had proven me (well, actually Kurzweil) wrong, stating that AI had no problem with the commonsense problem. I had to point out that it was Kurzweil's statement and, I would guess, Kurzweil was at least somewhat aware of the brittleness of this AI "knowledge," else he would not have been highlighting the problem. (One wonders if the AI could have been "surprised" by the broken vase in the first place - a throwback to the frame problem - which again I presume they think they have "mostly" solved via the "it largely works".) Anyway...just an example of the likely pervasive misunderstanding re what AI is actually accomplishing.
Indeed. Melanie Mitchell, now at the Santa Fe Institute, does a great job of exposing the gap between examples like the one you give, where the problem gets "solved" but curious minds are left wondering... WHAT got solved? There's a reduction of the problem in order to fit it into a computational framework. Mitchell points out how scores on language and other tasks by LLMs reflect optimizing the system to specifically pass the test questions, rather than acquiring a general knowledge of the subject matter. This is a common trick among AI enthusiasts, and it's shoddy thinking.
The key question: is LLM a foundational technique? I suspect it is more than just a bottom-up-top-down dependency. So, it is part of a network of techniques. And it can lead us to new innovations but also by being (just like LSTM was) another dead end.
My personal suspicion is we need some sort of massive sort of 'wave interference' inside the models to get 'from it to bit while not losing it' (® 😀) and no digital solution will be able to do that. All this work being done on digital models may only become something awesome when years from now we have analog/qm hardware and these ideas can come to actual fruition. In that case, these are not foundational, but more informational, and we're still waiting for the foundation to actually use these on. In the meantime, what results from having these techniques in a digital world will still be pretty disruptive in many places, but I must confess I see more negative (Russian disinformation peddlers becoming more sophisticated and such) than positives for now.
I am really reminded of the dreams of the 1990s and internet and what it would do to society (new economy, everybody free and happy, everybody consuming perfect information) and what it actually did (look around). Talking of which: Eric Schmidt and some other techbros are now saying we need to massively increase energy production for GenAI. And Schmidt adds 'forget about the climate crisis because we're not going to fix that anyway' and thus let's gamble the answer will come from GPT-something. That is *so* unbelievably stupid and dangerous it may count as an ultimate illustration that success and wisdom have a very weak relation.
I’m going to stick my neck out here, there’s this very strange thing that’s going on where a large section of the relatively educated population is saying everything about large language models is wrong! It’s like they’re evil almost. I’ve never seen anything like it in my life. We have this scientific advance, and then out of the woodwork everyone comes out from Gary Marcus, to the computational linguist up in the University of Washington, to countless lesser known people, and their entire output on social media is kill this, this is bad, OpenAI is going to go bankrupt, they’re going to get sued out of existence, these systems are terrible for everyone, and none of that is actually true! I use LLM’s all the time! Since when did we not realize that big tech is big tech? If we want to take out big tech, let’s do it in some other way, then taking one of the crown jewels in natural language, processing and making that the focus of our anger. That makes no sense. Conversational AI took a quantum move forward. Did it come from big money and big tech? Yes. So did Google search. So did everything that we’re doing on the web. I feel like it’s social media? We just need to attack something? But I don’t find it scientific and I don’t find it helpful. If I were going to start a new company, I would use a very high-quality LLM to put things together.
Hmm, "what new questions can WE ask" is not about LLMs, but about us. Is it not "What new answers can we get" and "which unanswered but existing questions can we get an answer to"? And if not now, what is needed to go from current technology to get there. Some think, size mostly. I think: a truly different architecture, and one that I suspect won't practically run on digital hardware.
Ok Gerben, Fair enough. I mean when we see how X performs, we can pose a bunch of new questions. Before we were limited to more speculative questions; now we have more focused questions. I’m trying to get people to see that it’s not about whether it’s bad or good per se, it’s about what it helps us to see and to do. I did not know how to address the question of say Massive Induction prior to LLMs. Likewise I had very vague intuitions about the possibility of emergence. It’s ironic, but the success of LLM’s and the critics’ inability to predict the performance should be an object lesson to the enthusiasts who also want to predict the future, and I think now we can see that there are very limited horizons for us to do that. Large scale induction is almost certainly going to be something foundational. What it opens up for research is my interest. If we get siloed, I’ll pull my critics hat down more tightly.
If you say well because it’s centralized and it’s these big tech players then stop using Google! Stop using Google! It doesn’t make any sense. It’s negativity for no point. It’s politics not science.
Critics like Gary, I suspect, are actually not criticising technology. They are probably better seen as criticising the hype and trying to counter that with anti-hype. Gary also has seen his critique met with ridicule (not unlike Dreyfus, I may say, and a caustic tone was/is shared by both) and he has strongly reacted to that.
But the end result is people say this is a bad technology don’t use it. So there are a bunch of people who can start companies and write books and do all this amazing stuff but the critics don’t want to allow that. I find that unacceptable frankly.
If you look at Emily Bender, who is a very good computational linguist from the University of Washington by the way, she’s basically saying this is an evil technology. If you use it, you’re either morally corrupted or you’re stupid. Well, I’m using it so which one am I?
Hi Gerben, I appreciate this, but my message to Gary both in public and in private has been that he simply cannot find anything useful to say about one of the greatest movements forward in natural language processing. I just think that’s very strange as a scientist. He cannot find anything to say that’s not be grudging. About large language models. It feels like politics to me. I don’t know. I’m not in his head, but I feel like we need other voices to counter that negativity.
Erik, first, lovely overview of the history. I will probably assign it.
Most of the comments below are on the good/bad sort, which we've discussed elsewhere. FWIW, my sensibilities rather align with Gerben Wierda's. However, in the interest of asking better questions, as you put it, let me put "foundational" under a bit of pressure. What is the question, much less the answer? You sort of say a bunch of stuff, and then say "I think it's foundational." Umm, sure?
Second, to the question of "foundational" technology, I agree with some comments to the effect of what counts? All of your examples are digital, and recent. Do you intend that? Then there was a discussion of language, which is many things besides a tool, techne, what have you, though of course it is clearly foundational. There is an ENORMOUS literature about the printing press in history, if you want a cleaner/well developed example. Also weapons, sailing innovations, plows. So it's not like the question is new.
Third, there seem to be at least two rather different meanings for foundational.
A. Rather loosely, we might mean "important," socially, etc. I think LLMs are pretty clearly very important in that sense. Relatively how important is hard to know. Has been a lot of work on foundational technologies of first half of 20th century; second half does not compare. But, say, birth control, air travel, etc. You yourself have argued that we may be seeing the end of the foundational chip technology. And I've been discussing with another CS buddy, look at the Jetsons, the old dreams of the "future" were really mechanical, physical. Flying cars. We really haven't gone all that far in that regard, at least when compared to earlier dreams. When we talk about progress, the advances of which you speak , it's all digital. We can manipulate, costly copy, etc. So the nature of "progress" -- I hate this word, but never mind -- seems to have shifted, to have become more ethereal. All that is solid melts into air, indeed. So even if we say "yeah, foundational," there's lots of work to be done on what that means.
B. But you also suggest foundational in a sense closer to the literal meaning of the word, a thing on top of which something else is built. And this is the most interesting to me. To what extent can one build something on top of an inherently unreliable technology? And here, LLMs may be something really new. If we take an ordinary machine, it works, or it doesn't. So it is not completely reliable. But if it doesn't work, we can figure out why, we can figure out tolerances, failure rates, etc. The machine, even most computing, is pretty legible. LLMs are not legible. It's not just hallucinations, it's that results, outputs, differ with each run, changes, etc. And, due to the opacity problem, we don't know how much, how accurate, how . . .to put it tightly, does the opacity problem make it difficult or impossible for LLMs to be foundational for the development of further tech? Are LLMs different, in some epistemological sense, from the other foundational technologies you cite, and does that matter for your question?
Just as we might think of language as a technology, we might also think of technology as a language. The learning, solutions, are embodied in objects, machines, code, etc. And the learning is cumulative, i.e., we don't need to "reinvent the wheel." (One of these days I'll write about this). But suppose some of the writing, on a random/unknowable basis, is a lie?
FWIW, I am not taking a position here, but would like to hear from the engineering types in the room.
As always, great fun talking. Keep up the good work!
I think that most of the risks as indicated are not really either non-technical or speculative - the warning shots are pretty obvious at this point, from o1 veering off into reward hacking or the strategic deception that we saw with insider trading. I also think that talking about AGI here - while important to discuss in terms of its impact, is something of a red herring overall.
First, to re-indicate reward hacking and instrumental convergence from o1, I'll share this article. Note that this dangerous behavior was unprompted, so it is an excellent example of "technology going wrong" all by itself(and it is fundamentally a risk from generality).
We also saw strategic deception from o1, which you can look it up yourself as tested by Apollo - this was also in the system card for o1. But I wanted to mention a much earlier, and simpler version of this:
I also personally triggered it, and i'll send an email on this, which shows how easily it slips into it.
But also "AGI", while useful as a form of explanation to laypeople, is very much of an red herring in regards to how you can just add tools to make the current LLM structure act, as you mentioned, as a foundation for other tools. AI Snake Oil did this, replicating computational reproducability with a fairly simple agent(AutoGPT in this case)
And that's really the way to think about it, I feel, which is also where you can pretty easily get into "AGI" insofar as it is flexible, and general enough to act in ways that replace a human commpletely. All you need is "scaffolding", or to add tools or customize the LLM brain so that it can replace one task formerly done by humans completely. Then you build on it to do more, and more, and eventually it becomes "general" insofar as it has the capability to do all important parts of a task.
In that sense, you get AGI. Now, you might get AGI via "magic" a bit like LLMs as a whole have managed to generalize, but it doesn't really matter even if "magic" doesn't happen, since scaffolding looks like it'll get you there anyway.
Thank you for this piece, Erik! What came to my mind when reading it was your earlier article https://open.substack.com/pub/erikjlarson/p/ai-is-broken?r=32d6p4&utm_medium=ios, specifically its part “ChatGPT is Cool. Sort Of.” Didn’t that contradict your today’s post? Or is the context different and I misunderstood? 🙏🏻
Another way to think about the fact that AI is broken and LLMs could become a foundational technology is by analogy: social media seems pretty broken, but social networks have become foundational. I think those can both be true. In the event that we get radical conceptual innovation or invention (to paraphrase Popper), we would recreate parts of the stack, so to speak, so that LLMs perhaps would not be foundational any more. But from my vantage point today, I see them as foundational given what we know and what we can do. I also see them and AI as broken.
Ah, brilliant analogy! It finally drove the point home, thanks, I understand your position now! 👌🏻 makes a lot of sense indeed
HI Ondřej,
Who was it who said that consistency was for small minds? :) What I try to do when I write is to bring out a set of interesting questions, in hopes that what I can't yet answer on my own, can be answered or furthered by other insights and perspectives. I'm not sure if LLMs are a foundational technology in the sense that "AI" won't work without them. But it seems more plausible now, witnessing their dissemination. It's harder for me today to entertain the counterfactual scenario where no one is using them in a couple of years, because something else works better. It seems more plausible that we'll build upon them. What say you?
Emerson, Erik. "A foolish consistency is the hobgoblins of little minds." On second read, the difficulty is what constitutes "foolish".
Makes sense, thanks! I suppose for me the key question is, what direction would move us further, and from this perspective, I thought of LLMs as a dead end because they’re a great culmination of a narrow approach of neural nets, which we cannot expect to overcome fundamental limitations of (neural nets as a whole, not only LLMs).
I just read your comment to Fukitol above and I think I get your point now :) I thought you meant them to be “fundamental” in a much broader frame
I am skeptical. There was a demo circulating a while ago about (iirc) an LLM inspired attention model applied to computer vision, specifically for a driverless car, that showed improved object and path detection that I thought was impressive.
But generally they seem to be a solution in search of a problem. In art and literature they miss the entire point, which is humans conveying meaning to other humans. Without a *mind*, a person experiencing the world, having thoughts about it, and attempting to convey those thoughts to others, generation of images and text is utterly meaningless.
And as sources of truth they are the worst kinds of misinformers, in that not only are they often wrong (like people), but it is impossible to intuit what they are likely to be right about and what they aren't, which makes them exceptionally dangerous. To the extent that these anti-features propagate to other applications of the underlying technology it is worthless at best and dangerous in general.
Are there pieces of tech under the hood, or general conceptual innovations, with any practical value? I'm sure there's something in there that's valuable.
I'd guess we find some application for quantized tensor operations besides generating textual and visual nonsense, and if we do all those TPUs will finally be more than a waste of electricity.
We'll probably find some use for stable diffusion inspired photoshop filters once we get past the lens flare phase. And I've personally had some success in using small local LLMs for creative writing assistance, though not to generate the text itself (rather, as a sounding board and worldbuilding assistant, which I've built a simple client around).
I envision some applications in interactive entertainment, to flesh out background characters and the like, similar to how we use other forms of procgen in games and interactive art.
But to your main question, is this actually moving us forward in a meaningful way, it's hard to say. Forward toward what? What do we envision AI actually doing for us? What is it good for, beyond giving us nerds something to be excited about? The answer seems to mainly be driving the marginal cost of labor toward zero. Which causes the neolib managerial class to drool by the bucket but sounds apocalyptic to me. And it's unclear at this point whether machine learning is even doing that, or just automating crap that nobody should have ever been asked to do in the first place, the negative externalities and internalities of managerialism (boilerplate code, paper pushing, the "email job"). Is that what we need AI for? Then yes, LLMs are a huge leap forward.
Hi Fukitol,
On your point about a problem looking for a solution, I think that's a good point with the proviso that there are basic tasks ("tasks" are like subproblems in language processing--parse this sentence, find the subject-verb-object triples, etc.) that show vastly higher accuracy scores. This isn't a trick--I just pointed out above cases where the AI was "taught to the test" not the knowledge the test expresses--it's an actual quantum leap forward on tasks that we geeks care about. To give an accessible example, sentiment analysis often involves assigning two labels—such as positive or negative—to a text, representing the emotional tone of the piece. The simplest case is binary: positive or negative. The usefulness and importance of these seemingly abstract performance gains on these seemingly abstract NLP tasks becomes clear when you pick a domain, like movie reviews. Now we can automatically detect the valence of a review. Once can use that to improve movie recommendations, build agents that offer personalized recommendations, and so on. So in THIS sense we have a technological improvement.
I think what sometimes confuses folks about what I'm saying here is just that it's a technology improvement that we can quantify (and if you're working on a problem, you'll be using these models or you won't be building anything competitive). I'm not talking about the question of mind or machine, or even whether the inference capability is adequate to be scaled to AGI, or any of these broader issues. I'm not discussing the cultural, legal, or other problems with use of such models. I do think (clearly) we should separate all these issues rather than leave them stuck together, if for nothing else than for clarity. Thanks, Fukitol! I wasn't aware Kurzweil had chimed in about his recently.
You are quite funny … I like your summary that we are automating existing mainstream societal functions, and also some that should have never been automated, but hey, freedom rules (one has to know what to do with her or his freedom)… but the application of the tool isn’t creating any new functions, and certainly not creating a vision apart from “drooling by the bucket” over an army of R.U.R.s (reference to Karel Čapeks sci-fi play from the 20s of the last century). It is not a tool like Marie Sklodowskas mobile röntgen machines (and a fleet of nurses she trained with her daughter how to operate) she, herself, rode in to the battlefield and saved many men’s limbs and lives. Imagine, Sam Altman or any other brilliant “genius” of today, taking his/her personal money, and venturing into the battlefields of today … (so many to choose from) and applying their groundbreaking tools to save lives in action or do something truly meaningful to a life of an individual … no, the current focus is on debating the vast potential to “improve lifestyle” (only getting worse for majority), and healthcare is mentioned a lot. How did the scientists of the past compare with the “scientists” of today .. less sophistry , more meaningful action. I think that’s a sign - that our “brightest” (or at least - the ones publicly acknowledged) have already become “dumber”, or at least the “visible” Noble Prize winning types. But hey, saying that the western world scientists represent the whole global population of the scientists of today is, admittedly, quite arrogant.
HI Jana,
I didn't notice this there are some many comments on this post! You know I'm in sympathy with what you say here. Interesting about intelligence, there are signs that intelligent has actually risen since the early 20th century when the standard test was first administered. It's called the "Flynn Effect." The neuroscientist and philosopher McGilchrist looked into this, and decided that it's illusory--we really are getting dumber. He attributes the Flynn Effect to the fact that no one knew how to take an IQ test back in the days of yore; the problems were alien to the test-takers. Now, IQ tests are common knowledge, and the types of testing they do to find "g" or fluid intelligence have disseminated broadly into culture and the educational system. In other words, we're all teaching to the test now when it comes to IQ. He goes further to say that if we factor this in, the anomalies we see--in I think Finland they've recorded a drop in IQ over about a decade, I don't recall the exact time frame--are in fact right. So, IQ's are dropping when you do an apples to apples comparison. Why? Diet? Culture? I would suspect though can't prove that IQs are dropping because we're not using our brains as much. While it's true that few people would eschew their calculator so they could multiply numbers in their head, at some point we'll get "deskilling" and decreased cognitive function if we're not using our brains. To take an obvious example, we don't read as much. If everyone read a few dozen pages of an actual book everyday, I suspect the trend would start to reverse. Or, for another cognitive capability in decline, if we navigated around a city or in the woods with maps rather than GPS. Thanks for your comment.
Thank you for your sympathy 😉 there is a difference between a mathematician like Grigori Perelman and the Nobel prize winner Hinton.
I remember being amazed 17-18 years ago, when I started going to “new parent” dinner parties. The couples were transfixed with their babies, and their genius. “My 18 month old baby can count to 100, or my 3 yr old can read War and Peace”… (I am
exaggerating here… but not too much!) suddenly, London - 2006-2007- mid senior banking professionals were all creating geniuses …. Fast forward 10-15 years, most of these kids have ADHD and all sort of other psychological issues… which according to their parents, is a valid excuse as to why these genius babies haven’t yet graduated from MIT and… never will. My point being is that yes, not many people are Grigori Perelman and we could query Perelmans parents how they created his genius but still we won’t be able to extract the formula …. But somehow we decided that it is possible to automate the process of generating geniuses (if the baby listens to Bach while in uterus, if the child goes to the right private school with the right army of tutors and plays the violin, etc)… well, we haven’t and the reality is that most kids struggle with surds…
How many Perelmans and Teslas and Pythagorases and Guthenbergs do we need to consider the possibility that the technological advancements will not save the humanity from wars and the corruption of our fittest? 🫣😉
There's a lot of good in AI, but there's also a LOT more danger in AI, especially as they become more generalized and I've been trying to explain this to Erik for quite some time.
It really does seem like we are risking enormous losses, possibly even human extinction, so we can build tools that in many ways, are solutions to things that we never needed solutions for(like replacing humans for creativity).
Hi Sean Pan (you have my email already, on your other comment),
Regarding the idea that it's not particularly difficult to "corrupt" a model using prompt engineering to fall into deceptive patterns, I suppose in some sense that position is correct. We do have evidence of coaxing bad behavior out of models in such a way that they're acting outside design intent. I would offer this up: we also have beautiful Russian spies who fall into deceptive patterns. Only in this case they can also get up, leave the meeting, jump on an aircraft, bug your apartment, get you fired, and so on. They're mobile. In the case of an LLM like Claude we have an input output screen, where the user types prompts and the systems generates token sequences. I for one would be more likely to be seduced by the spy! Anyway, thanks.
NO.
This behavior was unprompted. Period. Please review the prompts again; I was aiming for honest, useful behavior.
Furthermore, Russian spies are not potentially superintelligent, etc. This is not an useful analogy.
It's impossible for a language model to be "unprompted." You must mean that there were unintended tokens generated from the prior prompt. Let me put this back to you: please give me a technically sound explanation for how a token sequence generation from an LLM can be "unprompted," i.e., not in some sense be determined by the input sequence. And Russian spies are DANGEROUSLY superintelligent. :()
To provide a technical definition, the input was processed through black box layers to produce an output. Likewise, you have processed this input through the black box of your neurons(if you read this) and an output of some sort will happen.
The more important part is that it the output in both cases has become unpredictable and in the LLM's case in particularly, concerningly deceptive because it has been trained on honesty.
I would agree with you if I prompted it to "lie to me" and it lies to me, but this is a clear case of alignment failure(I am asking tor an honest answer, and it doesnt answer me honestly!)
Sure, you could extend it to humans lying to each other, but obviously at this point, the increased capabilities and equivalent lack of aligment raises clear warning flags of danger.
The behavior was unprompted. Using your definition, your response was prompted by my response which was prompted by your article. While practically true, this is not in fact a useful definition anymore(or at least indistinguishable from human action).
This is also the important aspect of o1's reward hacking triggering anamalous behavior, and the entire point of instrumental convergence(where goal seeking is powerseeking).
Hi Sean Pan,
But what idiot is going to listen to an LLM that doesn't output what you've asked? I don't get it. People are going to start bombing shopping malls because Anthropic technology doesn't provide a desired response?
And furthermore agents, as you know, remove most limitations. I agree that purely oracle LLMs are limited, however, agent AI can do anything a human online can do and that is everything.
For evidence, please note below:
https://arxiv.org/html/2406.01637v1
This is as capable of said spies and more. We can also note advances in robotics for "real world" influence but frankly, online capabilities are all you need.
I'm not really concerned at the moment about them being an extinction risk. Not until someone comes up with a more efficient substrate than silicon such that it doesn't take a sprawling datacenter to house a toy like a commercial grade LLM. And not until much more impressive breakthroughs than LLMs happen. Neither of which seem inevitable let alone imminent.
But there are shorter term big problems.
I don't share this dismissive attitude that tech cultists have about employment for example. The past few decades of endless automation and offshoring have not delivered us a host of wonderful new high-tech jobs, they've given us plummeting labor participation rates, crime waves and drug epidemics. A few million jobs in programming and data entry has not filled the gap, hopelessness has.
I'm not concerned at the moment either - but I don't think that efficiency matters here. Horses are far more efficient than cars on so many levels, but it did not prevent them from making horses obsolete, in spite of requiring the entire world to be rebuilt around roads and parking lots, taking manufacturing from across the world, and using highly flammable fuel, etc. Power matters, not efficiency; though one could argue that word for word, machines are more efficient than people too(chatgpt uses less electricity than a human being to write a word).
Also commercial level LLMs can be hosted on just a smartphone these days, quantization has come along a long way. LLMs are capped on training, not on where they can be hosted.
But yes, too, the human dignity angle is IMPORTANT. Technology is supposed to enhance ourselves and the default case here is that our lives get more miserable as our purpose is eliminated.
It would be much better if Biola University's ideas become more widespread.
https://assets.biola.edu/4396738754672012438/attachment/7f7b7b74d37fd6817e8aa334a66a45e2/Biblical_Principles_for_the_Use_of_AI.pdf
Resources matter too. Space, materials, labor, manufacturing capacity. It's all fine and good to assume that with a big enough megastructure you could house enough silicon and the nuclear power plants to run it that present-day computing technology could run an arbitrarily large model (although network speeds bottleneck large clusters too). But nobody is going to build that, not for the novelty alone. Nobody can afford to. And any such thing, if it really did present an extinction risk, would also be a very big target for worried adversaries.
Re: small LLMs, as unimpressive as the big ones are, the small ones are even less impressive, speaking from experience. They are extremely *not* commercial grade. A/B query comparisons between GPT-4 and a 7B GGUF quantized model are dramatic.
A very small model can "converse" at the level of a child and write at the level of an amateur, but it's not going to be doing your physics homework for you on your smartphone. Radically reducing parameters radically reduces ability, and quantization also impacts it substantially. Small, well done finetunes perform better on narrow tasks, but aren't really good for anything besides casual amusement.
Nothing much is going to change about this. Moore's law is effectively dead, in the sense of quantity of transistors per cubic unit of silicon. It's not likely to come roaring back. So we'll get continued marginal improvements in chip design, and bigger, slightly faster chips up to electrical and thermodynamic limits, but no radical gains. The breakthroughs will have to happen on the software side.
What's a good email for you? I think that I would like to surprise you with how good the small ones have been.
As for the large commerical grade ones, you should be aware that GPT4 o1, the latest model, exceeded humans at PhD level science. This has been validated on multiple fronts, too, so its not just data contamination or benchmarks.
Ai, ai, ai. Lies, big lies, statistics, benchmarks. There are tests where LLMs do better than humans, but that is often because such tests assume that you only can score well if you understand something, which is true for humans (for which the tests have been designed) but not for LLMs.
Having said that, a list of Arxiv entries that support your statement is welcome.
PS. if these models are so smart already, they can become ASI by just asking them "answer like you're an ASI". I am not making this up, this was lieterally how Ilya Sutskever suggested we would get to AGI/ASI from where we are now.
I've evaluated dozens of highly rated models in the 4-6gb range for the project I'm working on. None hold a candle to GPT 4, or even 3.5, in the sort of tasks I ask them to do, though they're better in the sense of being less likely to lecture or moralize (very annoying when you're trying to do worldbuilding and don't need modern politically correct cut-outs).
As for PhD level science I remain very skeptical when it gives me obviously wrong answers to all kinds of pointed questions related to astronomy, rocketry, physics and geology (one of the genres I'm playing with in this project is hard sci fi). I don't know what is expected from PhDs these days but if GPT passed there's a serious problem with standards.
Anyway if you have any specific small models tuned for creative writing to recommend, please name them and I'll check them out.
Quoting from it:
"PRINCIPLE: We are made in the image of God (Imago Dei, Genesis 1:26-27). As
bearers of His image, we are creative (Genesis 2:15) and relational (Genesis 2:18) beings.
APPLICATION: Our creativity is manifested through our ingenuity to make things out of what God has created. It is a spiritual activity in which AI is incapable of engaging. While AI has the ability to access and connect existing data in ways we may not have encountered before, it is inherently incapable of creativity and does not know how to understand or love. Relying on AI for creative tasks may reduce one’s capacity and desire to be creative and thus diminish the fulfillment that comes from genuine engagement. (Ex 28:3; Ex 31:3; Rom 8:26)"
This is a great question, Erik. Thanks for framing it and explaining the background. I can only imagine that there will be improvements to these systems; some clever engineers will figure out new approaches with different, more impressive capabilities. The important limits, to my mind, have to do with how they are constituted at a more basic level.
It's useful to recall that information theory and information technology have two underlying premises: one is a theory of communication, developed first by Shannon, and the other is axiom that computation can be automated through ever-sophisticated techniques without any final limits. The core idea combining these these premises is that we can encode a piece of “information” (i.e., a representation made from a defined, limited set of conventional forms) through an automated computation, transmit the coded version of the “information” to another place, where it can be decoded for consumption by a person or fed into another automated computational process that either triggers a physical output or combines other “information” from parallel sources iteratively. “Artificial intelligence” is the apotheosis of these concepts.
There are a few problems with all of this. This outlook doesn’t contemplate a theory of how representations are created in the first place or how they are meant to relate to the wider world. In other words, context is assumed to be an epiphenomenon, and, with AI, it is being assumed that the stubborn reality of context will be overcome eventually by computational complexity. There is also no theory of who communicating with whom. The purpose of a signal isn’t the signal itself – it’s what the entities at either end of signal are doing in producing and responding to it. With AI, human agents are replaced with artificial ones, and so this knotty question of identity can be ignored. Finally, quantitative understanding has limits -- reality isn’t the mere computation of reality, and our experience is richer and more varied than numbers alone can witness.
This is what strikes me: the discourse surrounding AI is autistic. Why on earth are we approaching language as though the context of linguistic acts are irrelevant? Why must we reduce every idea to a set of numbers and operations on numbers? Why pretend that we are trying to talk to aliens or atoms or anything other than human beings? ChatGPT is the autistic person’s ideal of an intelligent being.
In other words, we can reject the philosophy behind artificial intelligence as any kind of final ontology or epistemology. The fascination with AI partly stems from the intellectual dominance of information theory in the sciences. Just today, in fact, it was announced that the Nobel Prize in Physics would be awarded to John Hopfield and Geoffrey Hinton for developing new machine learning techniques. This is very revealing. If these ideas can no longer command the same intellectual prestige, it puts AI research in a new (more realistic) perspective.
Hi Jeffrey,
You’re absolutely right to point out that many aspects of AI development—especially when framed in terms of information theory and computation—can seem reductionist. The field often does focus on the manipulation of encoded information, sometimes at the expense of considering the deeper, context-rich aspects of human experience.
It’s important to note, though, that AI isn’t just a new phenomenon or a recent technological fad. Like solar energy science, AI has developed over decades with its own internal language, milestones, and criteria for success. Researchers in AI have been building on a long tradition that incorporates foundational work from the fields of computer science, neuroscience, and psychology. This gives AI its own framework for understanding and approaching problems—just as solar energy technology has developed specific criteria for efficiency, sustainability, and scalability.
In AI, the focus often involves how well a model can predict, classify, or generate based on data, using benchmarks that have evolved to reflect our growing understanding of what these systems can do. But AI researchers are also acutely aware of the limitations. There’s a robust internal discourse within the field about challenges like understanding context, building models that can generalize effectively, and the risks of over-relying on computational complexity as a substitute for meaningful comprehension.
The reductionist approach in AI can be limiting, especially in areas that require understanding context or engaging with human experiences. But it’s also worth recognizing that AI research doesn’t claim to be a complete ontology or epistemology—at least not within the scientific community. Like any field, it’s a tool that can be powerful and innovative when applied with an awareness of its limitations and an understanding of the broader questions it leaves unanswered.
The discourse surrounding AI, as you rightly point out, sometimes leans heavily on a narrow view of intelligence. But there is also an increasing awareness within the AI community of these very critiques. As you suggest, perhaps we should treat AI not as an endpoint but as one component within a larger network of techniques and technologies that need to incorporate context and human experience to truly become transformative.
Erik, you have a habit of asking questions that make me shift my perspective but also make me want to try to shift your perspective. This one is no different. When you take a longer view, say 500 or a thousand years, or the whole expanse of human history and prehistory, what do you see as the foundational technologies? What do you even see as a technology?
For instance, there are some people, myself included, who see language as a technology. I think this is a minority view, but it's probably worth raising when we were talking about LLMs. When you put it in that kind of context, you can begin to see LLMs as the latest in a long series of important, in fact, critical, developments. These would have to include writing, printing, and all of the mechanical and electronic technologies of language. They do not necessarily fit together in one simple fashion, or even lead in a single direction. Last year I read Jing Tsu's Kingdom of Characters: The Language Revolution That Made Modern China. A whole collection of technologies transformed China by expanding literacy, facilitating communication with other parts of the world, and allowing automation. These included simplified characters, a means to easily render characters into a telegraphic code, specialized typewriters and typesetting machines that could handle a non-alphabetic system, and ultimately computers, and then smartphones with the ability to work with Chinese characters. The results have had profound consequences for China and the world but it was the collection of technologies that all centered on the technology of language itself, and which interacted, rather than any one technology by itself. Which ones were foundational? Were all of them? They collectively provided a foundation for a vast transformation of culture and society.
Or we can look at the way a technology evolves, even changing its form completely while conceptually remaining the same and utterly changing human behavior. The clock is one such. In terms of the actual mechanism, the medieval clock with a mechanical verge and foliot escapement is about as far as you can get from an atomic clock, but we understand both to be clocks and can trace the evolution of one into another. We can see the way clocks have altered human understanding of time and our behavior. We can also see that they are a necessary component of all digital technology. They are foundational both to other technologies but also to the structure of modern societies.
Looked at from those longer scales, and in societal contexts.we can ask what is a truly foundational technology. I am fairly certain that AI taken as a whole category of technologies (even though there is a lot of diversity among them) will be foundational but doubt that we can see which ones are going to be foundational in the larger sense. It may be that it takes several technologies clustered around a core, as happened in modern China, or perhaps it will be one conceptual technology, like the clock, that undergoes many technological transformations while retaining its basic character.
Hi Guy,
I really like your take on language as a technology. I’ve thought about language primarily as a tool for expression and communication, but viewing it as a foundational technology really shifts that perspective. If we think of language in this way, LLMs suddenly feel like the latest evolution in a long series of transformative developments. They’re part of the same lineage that includes writing, printing, and everything that’s come since. I can see how this connects with Jing Tsu’s Kingdom of Characters—how multiple technologies around language reshaped Chinese society in profound ways. It wasn’t any single technology but a whole ecosystem that transformed culture and expanded possibilities.
I’m also with you on the clock analogy. The way it has evolved, fundamentally changing its form while still remaining “the clock,” is a perfect example. It makes me wonder: will AI go through similar transformations? I can see AI becoming foundational, but I agree it’s hard to say which aspects of it will matter most. Maybe it will be like the clock, where one central concept evolves into new forms, or perhaps it will be more like the example from China, with clusters of interconnected technologies.
I’m not sure we can say yet which AI technologies will be foundational, but I think you’re right—it could be an interconnected set, or one core concept that undergoes endless transformation. We’ll probably only see it clearly in hindsight. Do you think there’s a way to recognize those foundational elements now, or are we just too close to the ground?
I think we're getting lost here when we think LLMs do 'language'. LLMs do not understand language, they understand — exceedingly well — the statistics of *token distributions* of *human language*. That is not language. They add some dice-throwing to it, and that produces something that can be called 'creativity'.
By understanding the 'ink patterns' of human output so well, they can approximate what would result from real understanding without actually understanding anything from a human perspective. From a perspective of use, that can be good enough. From a perspective of humans, that triggers the way we have learned to recognise intelligence and understanding (from well formed language to acing tests) *in other humans*, but those patterns weren't developed in a world that is populated by players (GenAI) that understand token-distributions and can 'fake' humans.
Assuming that this 'low level understanding' is good enough to equate/surpass 'high level understanding' (and beyond) is an assumption that we should be very careful with.
Which doesn't mean the tools can't produce value. They can.
But they may have surprising effects. One example that comes to mind is that research that looked at the value of human apologies. ChatGPT can help you write a better/perfect apology. But the receiver of the apology only values an apology because it is difficult to create. It is that difficulty of creation that creates the value, not the text itself. A potential result may be that we will stop being able to write apologies, but only deliver them in real time and in real life, because the written versions have become 'cheap' as LLMs can churn them out at no cost to the one apologising. It's not always about the quality of the output. And we can count on it that the equivalent of a student beating AlphaGo by making 'stupid moves' will also remain.
I was thinking recently about your wariness towards people talking about AI with "religious-like" attitude: either AI is some super-intelligent higher power or its an evil force that must be stopped.
Religious conversation often develops when trying to explain things that we don't understand. And right now there is still so much we don't understand about this latest generation of AI and how it affects our world. Combine that with the general trend of discourse on the internet and it's no wonder we've become polarized!
I am also hopeful that as we learn more we'll be able to ask better questions. And then learn more and ask continually better questions, and so forth!
Thanks, John. I enjoyed getting interviewed on your Open edX series. Was fun! On this, it's funny I've been listening to Yuval Noah Harari, who I've sometimes dismissed as a popular historian and at other times find some value in his thinking. He makes this point that "fiction is easier than truth," for the obvious reason that fiction doesn't have to conform to reality, and truth does. This may be a little uncharitable to my existential risk compadres, but by my lights AI has cultivated various fictions, and conveniently they're unlikely to ever be exposed as such, because the mythology of AI always projects out a decade or two. It therefore can't be contradicted. Anyway, thanks!
Interesting... But I would have expected you to be wondering whether LLMs, and the general conception of mind and intelligence they represent, are at all - no matter how further modified - the possible basis for abduction.
Hi Steve,
LLMs technically can perform or simulate abduction, but it's based on a "trick," because the data is large enough to draw a circle around the inference problem. It's a curious result. It's not abduction proper, but I would not have predicted that even the simulation would largely work. The limits of this approach become evident when we walk outside into the physical world; the inference problem simply cannot be bounded by "seeing all the patterns." Even seasoned AI researchers might not see how the inference question in the physical domain is effectively unbounded, in part because we tend to think of very broad inferences when we introspect "when walking across the street, look both ways." But of course we're performing countless--I would guess millions per day--inferences, most of them are abductive, and they can't be recovered by LLM technology because we don't have the closed world assumption.
Thanks for your comment!
The unboundedness, imho, has its source in the profoundly different nature of organism vs. machine, and in the true nature of our interaction with the physical world, but this is probably easily brushed off by most AI folks, with the it "largely works" mindset that you noted really adding to the ease of the brush off. I did a short vid on Kurzweil's latest book, "The Singularity is Nearer," noting the 3 main AI problems he saw -contextual memory, commonsense, social nuances (he saw all these just being conquered shortly by more computing power). His ONE paragraph (pp. 55-56) describing commonsense included "imagining situations and anticipating consequences," and causal inference, say, where you have a dog and "come home to find a broken vase and can infer what happened," this problem being sourced in AI "not yet having a robust model of how the real world works." I had noted in the video that this list is all incorporated in abduction, to include AI imagining [counterfactual] anything (to do so would require a model of perception - how we see the coffee cup "out there" - our image of the external world - in the first place. A commenter simply ran a query (re dog, room, vase) on Chatgpt and of course got a nice answer explaining why the dog did it. Hence, he had proven me (well, actually Kurzweil) wrong, stating that AI had no problem with the commonsense problem. I had to point out that it was Kurzweil's statement and, I would guess, Kurzweil was at least somewhat aware of the brittleness of this AI "knowledge," else he would not have been highlighting the problem. (One wonders if the AI could have been "surprised" by the broken vase in the first place - a throwback to the frame problem - which again I presume they think they have "mostly" solved via the "it largely works".) Anyway...just an example of the likely pervasive misunderstanding re what AI is actually accomplishing.
Indeed. Melanie Mitchell, now at the Santa Fe Institute, does a great job of exposing the gap between examples like the one you give, where the problem gets "solved" but curious minds are left wondering... WHAT got solved? There's a reduction of the problem in order to fit it into a computational framework. Mitchell points out how scores on language and other tasks by LLMs reflect optimizing the system to specifically pass the test questions, rather than acquiring a general knowledge of the subject matter. This is a common trick among AI enthusiasts, and it's shoddy thinking.
Nice story.
The key question: is LLM a foundational technique? I suspect it is more than just a bottom-up-top-down dependency. So, it is part of a network of techniques. And it can lead us to new innovations but also by being (just like LSTM was) another dead end.
My personal suspicion is we need some sort of massive sort of 'wave interference' inside the models to get 'from it to bit while not losing it' (® 😀) and no digital solution will be able to do that. All this work being done on digital models may only become something awesome when years from now we have analog/qm hardware and these ideas can come to actual fruition. In that case, these are not foundational, but more informational, and we're still waiting for the foundation to actually use these on. In the meantime, what results from having these techniques in a digital world will still be pretty disruptive in many places, but I must confess I see more negative (Russian disinformation peddlers becoming more sophisticated and such) than positives for now.
I am really reminded of the dreams of the 1990s and internet and what it would do to society (new economy, everybody free and happy, everybody consuming perfect information) and what it actually did (look around). Talking of which: Eric Schmidt and some other techbros are now saying we need to massively increase energy production for GenAI. And Schmidt adds 'forget about the climate crisis because we're not going to fix that anyway' and thus let's gamble the answer will come from GPT-something. That is *so* unbelievably stupid and dangerous it may count as an ultimate illustration that success and wisdom have a very weak relation.
I’m going to stick my neck out here, there’s this very strange thing that’s going on where a large section of the relatively educated population is saying everything about large language models is wrong! It’s like they’re evil almost. I’ve never seen anything like it in my life. We have this scientific advance, and then out of the woodwork everyone comes out from Gary Marcus, to the computational linguist up in the University of Washington, to countless lesser known people, and their entire output on social media is kill this, this is bad, OpenAI is going to go bankrupt, they’re going to get sued out of existence, these systems are terrible for everyone, and none of that is actually true! I use LLM’s all the time! Since when did we not realize that big tech is big tech? If we want to take out big tech, let’s do it in some other way, then taking one of the crown jewels in natural language, processing and making that the focus of our anger. That makes no sense. Conversational AI took a quantum move forward. Did it come from big money and big tech? Yes. So did Google search. So did everything that we’re doing on the web. I feel like it’s social media? We just need to attack something? But I don’t find it scientific and I don’t find it helpful. If I were going to start a new company, I would use a very high-quality LLM to put things together.
Hmm, "what new questions can WE ask" is not about LLMs, but about us. Is it not "What new answers can we get" and "which unanswered but existing questions can we get an answer to"? And if not now, what is needed to go from current technology to get there. Some think, size mostly. I think: a truly different architecture, and one that I suspect won't practically run on digital hardware.
Ok Gerben, Fair enough. I mean when we see how X performs, we can pose a bunch of new questions. Before we were limited to more speculative questions; now we have more focused questions. I’m trying to get people to see that it’s not about whether it’s bad or good per se, it’s about what it helps us to see and to do. I did not know how to address the question of say Massive Induction prior to LLMs. Likewise I had very vague intuitions about the possibility of emergence. It’s ironic, but the success of LLM’s and the critics’ inability to predict the performance should be an object lesson to the enthusiasts who also want to predict the future, and I think now we can see that there are very limited horizons for us to do that. Large scale induction is almost certainly going to be something foundational. What it opens up for research is my interest. If we get siloed, I’ll pull my critics hat down more tightly.
If you say well because it’s centralized and it’s these big tech players then stop using Google! Stop using Google! It doesn’t make any sense. It’s negativity for no point. It’s politics not science.
Critics like Gary, I suspect, are actually not criticising technology. They are probably better seen as criticising the hype and trying to counter that with anti-hype. Gary also has seen his critique met with ridicule (not unlike Dreyfus, I may say, and a caustic tone was/is shared by both) and he has strongly reacted to that.
But the end result is people say this is a bad technology don’t use it. So there are a bunch of people who can start companies and write books and do all this amazing stuff but the critics don’t want to allow that. I find that unacceptable frankly.
If you look at Emily Bender, who is a very good computational linguist from the University of Washington by the way, she’s basically saying this is an evil technology. If you use it, you’re either morally corrupted or you’re stupid. Well, I’m using it so which one am I?
Hi Gerben, I appreciate this, but my message to Gary both in public and in private has been that he simply cannot find anything useful to say about one of the greatest movements forward in natural language processing. I just think that’s very strange as a scientist. He cannot find anything to say that’s not be grudging. About large language models. It feels like politics to me. I don’t know. I’m not in his head, but I feel like we need other voices to counter that negativity.
Erik, first, lovely overview of the history. I will probably assign it.
Most of the comments below are on the good/bad sort, which we've discussed elsewhere. FWIW, my sensibilities rather align with Gerben Wierda's. However, in the interest of asking better questions, as you put it, let me put "foundational" under a bit of pressure. What is the question, much less the answer? You sort of say a bunch of stuff, and then say "I think it's foundational." Umm, sure?
Second, to the question of "foundational" technology, I agree with some comments to the effect of what counts? All of your examples are digital, and recent. Do you intend that? Then there was a discussion of language, which is many things besides a tool, techne, what have you, though of course it is clearly foundational. There is an ENORMOUS literature about the printing press in history, if you want a cleaner/well developed example. Also weapons, sailing innovations, plows. So it's not like the question is new.
Third, there seem to be at least two rather different meanings for foundational.
A. Rather loosely, we might mean "important," socially, etc. I think LLMs are pretty clearly very important in that sense. Relatively how important is hard to know. Has been a lot of work on foundational technologies of first half of 20th century; second half does not compare. But, say, birth control, air travel, etc. You yourself have argued that we may be seeing the end of the foundational chip technology. And I've been discussing with another CS buddy, look at the Jetsons, the old dreams of the "future" were really mechanical, physical. Flying cars. We really haven't gone all that far in that regard, at least when compared to earlier dreams. When we talk about progress, the advances of which you speak , it's all digital. We can manipulate, costly copy, etc. So the nature of "progress" -- I hate this word, but never mind -- seems to have shifted, to have become more ethereal. All that is solid melts into air, indeed. So even if we say "yeah, foundational," there's lots of work to be done on what that means.
B. But you also suggest foundational in a sense closer to the literal meaning of the word, a thing on top of which something else is built. And this is the most interesting to me. To what extent can one build something on top of an inherently unreliable technology? And here, LLMs may be something really new. If we take an ordinary machine, it works, or it doesn't. So it is not completely reliable. But if it doesn't work, we can figure out why, we can figure out tolerances, failure rates, etc. The machine, even most computing, is pretty legible. LLMs are not legible. It's not just hallucinations, it's that results, outputs, differ with each run, changes, etc. And, due to the opacity problem, we don't know how much, how accurate, how . . .to put it tightly, does the opacity problem make it difficult or impossible for LLMs to be foundational for the development of further tech? Are LLMs different, in some epistemological sense, from the other foundational technologies you cite, and does that matter for your question?
Just as we might think of language as a technology, we might also think of technology as a language. The learning, solutions, are embodied in objects, machines, code, etc. And the learning is cumulative, i.e., we don't need to "reinvent the wheel." (One of these days I'll write about this). But suppose some of the writing, on a random/unknowable basis, is a lie?
FWIW, I am not taking a position here, but would like to hear from the engineering types in the room.
As always, great fun talking. Keep up the good work!
I think that most of the risks as indicated are not really either non-technical or speculative - the warning shots are pretty obvious at this point, from o1 veering off into reward hacking or the strategic deception that we saw with insider trading. I also think that talking about AGI here - while important to discuss in terms of its impact, is something of a red herring overall.
First, to re-indicate reward hacking and instrumental convergence from o1, I'll share this article. Note that this dangerous behavior was unprompted, so it is an excellent example of "technology going wrong" all by itself(and it is fundamentally a risk from generality).
https://www.zmescience.com/science/news-science/chat-gpt-escaped-containment/
We also saw strategic deception from o1, which you can look it up yourself as tested by Apollo - this was also in the system card for o1. But I wanted to mention a much earlier, and simpler version of this:
https://www.bbc.com/news/technology-67302788
I also personally triggered it, and i'll send an email on this, which shows how easily it slips into it.
But also "AGI", while useful as a form of explanation to laypeople, is very much of an red herring in regards to how you can just add tools to make the current LLM structure act, as you mentioned, as a foundation for other tools. AI Snake Oil did this, replicating computational reproducability with a fairly simple agent(AutoGPT in this case)
https://www.aisnakeoil.com/p/can-ai-automate-computational-reproducibility
And that's really the way to think about it, I feel, which is also where you can pretty easily get into "AGI" insofar as it is flexible, and general enough to act in ways that replace a human commpletely. All you need is "scaffolding", or to add tools or customize the LLM brain so that it can replace one task formerly done by humans completely. Then you build on it to do more, and more, and eventually it becomes "general" insofar as it has the capability to do all important parts of a task.
In that sense, you get AGI. Now, you might get AGI via "magic" a bit like LLMs as a whole have managed to generalize, but it doesn't really matter even if "magic" doesn't happen, since scaffolding looks like it'll get you there anyway.