The Top 5 Misconceptions About AI Right Now
Bad ideas, false assumptions, and where the field is going wrong
All models converge on the center of the distribution. That takes a lot of power, but not as much thought….
1. The grounding problem isn’t solved. We’ve mostly routed around it.
The grounding problem has been a central issue in cognitive science and philosophy for decades. How do symbols get their meaning? How does a system connect language to the world around it? More trenchantly: how does my thought, expressed as the word “cup,” refer to THAT cup sitting there on my kitchen table?
We don’t have a general solution to this class of problem. Turns out, the philosophers do have something to say, since the AI engineers still have no good solution to this problem, which is why our robots and self-driving cars don’t work.
And philosophers have known and discussed the problem for centuries. The key question, and one that bears directly on the failures of modern AI, is this: how does a token come to refer? How does a system bind an internal symbol—not referring to anything—to an external object, a property, or an event in a way that is stable and usable for further inference, and revisable through future interactions of the system with its environment? We don’t know.
What’s changed is not that we’ve solved the grounding problem, but that we’ve stopped treating it as central. We’ve taken data—large, static corpora of text and images—as an adequate answer, ignoring the fact that “data” is also internal to a cognitive system and hardly a good candidate for grounding anything except in a spreadsheet. This is a colossal lacuna in modern AI thinking and research and somewhat inexcusable given the obvious need for such a capability, and a theory explaining it. Welcome to AI research.
Data is a record of prior human activity. It is already interpreted, already structured, already grounded by someone else. A system trained on that data is learning statistical regularities over representations, not learning how those representations connect to the world. Data, in other words, does not solve the grounding problem—it ignores it.
A language model produce language about physical situations—say, that objects fall, that collisions happen, or that liquids pour. But it does not learn what fixes the reference of those terms. It does not know what makes an instance of “falling” an instance of falling, or which features of a situation are causally relevant versus incidental.
It has no mechanism for what we might call reference stabilization—the capacity to fix what a symbol refers to across changing contexts, and to maintain that reference through perceptions, state transformations, and actions.
The upshot is that we ask an “AI” about a slightly novel physical scenario—an object balanced in an unusual way, a container with a nonstandard opening, or a change in support—and we expect performance to degrade. The system isn’t “confused,” but simply has no underlying model useful for making progress on grounding. Spreadsheets don’t know about the world.
That’s why self-driving cars don’t yet “work” either.
Neuroscience could come to the rescue, if only we understood the brain better. Here, researchers face a quagmire too that should make us cautious about easy analogies between brains and current machine learning systems.
Yes, neuroscience has uncovered important regularities in early sensory processing. We know a good deal about retinotopic organization in vision, orientation-selective neurons in primary visual cortex, hierarchical feature extraction along the ventral stream, and population coding in sensory areas. These are real achievements. They also helped inspire early neural architectures, including convolutional models and other hierarchical systems for pattern recognition.
But none of that gives us a theory of grounding.
At most, it gives us partial constraints on implementation. It tells us something about how biological systems process sensory input at low and mid levels—edges, contours, motion, invariances over position and scale. It does not tell us how a system comes to represent this cup as that enduring object there on the table, how it binds a variable to that object across changing viewpoints, or how it updates its representation when it acts on the world and the world pushes back.
In other words, neuroscience may illuminate parts of the pipeline from sensation to representation, but it does not yet explain how reference is fixed, stabilized, and revised through embodied interaction. It gives us clues. It does not solve the problem.
And if anything, the biological comparison cuts against the dominant engineering strategy.
Humans are low-data learners. Infants acquire a basic understanding of objects, persistence, containment, support, and causality through relatively sparse but richly structured interaction with the environment. They are not ingesting terabytes of text. They are embedded in the world, acting in it, failing in it, and updating on the basis of feedback. Their concepts emerge not from passively absorbing records of prior linguistic behavior but from tightly coupled perception-action loops. That matters.
If intelligence in biological systems depends on embodied, intervention-rich learning, then treating ever-larger datasets as a substitute for experience is not merely incomplete. It may be fundamentally misguided. We are taking the residue of human cognition—texts, images, labels, annotations—and mistaking it for cognition itself.
This is the deeper problem with the current paradigm. It assumes that grounding can be deferred, approximated, or eventually washed out by scale. Train on enough data, and perhaps the problem disappears. But there is no good reason to think that. More of what is ungrounded does not become grounded simply by accumulation. Correlation does not turn into reference by getting bigger.
A system that lacks the ability to intervene in the world, to bind symbols to stable objects through action, and to revise those bindings in light of consequences, is not solving the grounding problem. It is operating upstream of it.
That is why these systems can appear uncannily capable and yet fail in ways that remain structurally familiar. They can generate language about the world without possessing a workable relation to the world. They can mimic the surface forms of understanding while lacking the conditions that would make understanding possible.
So the issue is not just that current AI systems lack grounding.
It is that the field has largely reorganized itself around methods that make grounding easy to ignore. Data gives the appearance of contact with reality because it is full of the traces of human contact with reality. But the contact is inherited, not achieved. The machine receives the representation after the fact. It does not earn it through its own encounter with the world.
Until that changes, the problem remains exactly where philosophers and cognitive scientists said it was: at the point where symbols are supposed to become about something.
We still do not know how that happens. And until we do, talk of machine understanding should be treated with far more caution than the field now permits.
2. Correlation versus causation used to be a slogan. Now we’re building it into our most advanced systems.
Pundits and seemingly everyone else today produce lots of loose talk about “reasoning” in modern AI, but when you press on causation, the story collapses.
Turing Award winner Judea Pearl has done serious work here with his directed acyclical graphs (DAGs), but the approach is still constrained by the types of problems that allow a determinate graph to spell out variables and dependencies in advance. That is already a significant limitation.




