Cybernetics and Inference
Notes on Ashby and others' understanding of intelligence
Hi everyone,
I’m reading The Cybernetic Brain by Andrew Pickering. What I’m doing here is sort of spelunking, I’m looking to reconstruct the theory of cognitive intelligence—if there ever was one—from the historical records of Cybernetics, the theory mostly attributed to Nobert Wiener (he coined the term) but was also the child of Warren McCulloch, of early neural network fame, and the lesser known but brilliant doctor of medicine W. Ross Ashby, among a few others.
What did the early cybernetics folks think about the phenomenon of “intelligence”? Oddly, I found light on my own exposition of abduction in my earlier book, The Myth of Artificial Intelligence. Here are selections from the full passage, from Pickering quoting Ashby:
To illustrate, suppose that Michelangelo made one million brush strokes in painting the Sistine Chapel. Suppose also that, being highly skilled, at each brush stroke he selected one of the two best, so that where the average painter would have ranged over ten, Michelangelo would have regarded eight as inferior….
Ashby goes on to argue in this manner until he arrives at the obvious information-theoretic conclusion, which is that Michelangelo “picked” a painting from an enormous space of possibilities, “…one painting from five-raised-to-the-one-millionth-power…”.
This sort of reasoning would make Ashby’s contemporary, Claude Shannon, proud.
This struck me almost instantly—I hadn’t known any of the pioneers including Wiener had bothered with concepts like “genius” or intelligence directly—because it’s essentially the same logic I used to motivate abductive inference as more plausible and ubiquitous than any type of data-driven or inductive inference at the foundations of thinking, and more specifically of inferring.
But I meant abduction to open-up to the universe, as it were, in so far as the selection we make is typically among an effectively infinite number of logical possibilities, though fewer would be salient given circumstances. Infinite possibilities reduced to a finite selection procedure is a bit of a miracle itself.
Yet Ashby’s account is essentially deterministic, in the sense that the goal (say, of creating a great work of art like the Sistine Chapel) is already given in advance. And Pickering points out in his critique of Ashby, and I think rightly, that scientific discovery is best thought of as the imaginative extension of possibilities in order to see how some and others play out. It’s worth quoting Pickering here:
What I have found instead are many instances of open-ended, trail-and-error extensions of scientific culture. Rather than selecting between existing possibilities, scientists (and artists, and everyone else, I think) continually construct new ones and see how they play out.
Hmm. This is food for thought. A few more thoughts here:
The data-driven statistical model (Big Data AI) is certainly not in the imagining possibilities and seeing “how they play out” game. The selection from among a very large space of possibilities fits hand in glove with the way large language models actually work—that’s what it means to have a statistical selection criteria for the next token based on a sequence of prior tokens, all projected into a very large dimensional space. Token embeddings provide the bucket out of which to pick—statistically—the next token, a process which as we have seen essentially recreates natural language conversation or at any rate responses to natural language prompts seen naturally as playing our “language game.”
To introduce the tripartite inference scheme—deduction, induction, abduction—into the present discussion, the critique I offer is that induction cannot possibly suffice, not necessarily for the grander reason that we expand the space of possibilities with imagination, then let these expansive possibilities “play out,” perhaps like a story,1 but that no cognitive system can possible have access to the relevant facts in a dynamically changing environment.
On the Internet, we may have a closed world assumption so that the number of possibilities in a given selection is bounded and searchable. In the real world “outside” cyberspace, this will not be true. This is one reason why self-driving cars don’t “work” very well—and good luck fixing these problems until we understand these core issues better.
C.S. Peirce said roughly the same thing as Ashby, a half-century earlier, by the way, something I discussed in The Myth of AI. In his much-read “The Fixation of Belief” (1877), he discussed Johannes Kepler fixing on the right geometry to describe the motions of planets around the sun, and his point was that the number of possible shapes was quite large indeed:
The early scientists, Copernicus, Tycho Brahe, Kepler, Galileo, Harvey, and Gilbert, had methods more like those of their modern brethren. Kepler undertook to draw a curve through the places of Mars; and to state the times occupied by the planet in describing the different parts of that curve; but perhaps his greatest service to science was in impressing on men's minds that this was the thing to be done if they wished to improve astronomy; that they were not to content themselves with inquiring whether one system of epicycles was better than another but that they were to sit down to the figures and find out what the curve, in truth, was. (italics mine)
And further, in his Cambridge Conference Lectures in 1898:
Kepler... imagined that he had 1000 hypotheses and that all but one were wrong. He tried a number of forms of orbits... and found that the ellipse fits the observations... It was a hundred to one that the ellipse was not the true orbit, and yet it was.
What should we say here? Kepler was a good guesser? But he was, in a sense. The problem is that we don’t have an adequate theory of intelligence, or of how we discover or solve problems, for that matter.
We can say this. To Peirce, and to Ashby the next century, the problem was this selection among such a large number of possibilities that it may as well be infinite. This ought to strike us as a mystery, and one that modern AI has hardly even scratched the surface of, at best, or at worst simply provided another distraction from Silicon Valley.
I used this very logic of the “selection problem” to formulate the problem of abduction, following Peirce. Whereas induction can use arbitrary amounts of prior data—the selection space to be searched can be quite large indeed—for the purposes of fixing probabilities given some inference task (completing the next token in sequential problems like language understanding and generation), for abduction the task becomes not the fixing of probabilities but the ignoring of options. It’s more powerful, because very rare occurrences can still function as proper explanations given some problem or other. Abduction is looking for clues, not optimizing for outcomes.
Yet the fixed set of elements is essentially the same—my logic along with Peirce’s et al is that it makes no practical difference if it’s “effectively infinite,” since even computers the size of Earth will eventually fail at these types of search tasks, and eventually the right bit of information will not have been scraped from the Internet.
Yet perhaps there’s a bolder stroke here. Perhaps we don’t select at all, but create the possibilities, and then see how they play out.
Now how the heck would we model—let alone think about—this in cognitive science or its commercial glitz and glam cousin, “AI”?
Generative AI and LLMs are the far outpost of induction using lots of compute power and lots of data. But our question about the mind, and the mystery of intelligence, remains.
Erik J. Larson
An interesting take on this is from a recent book I plan to review one day, Primal Intelligence, by Angus Fletcher. Fletcher point in the book and in many private discussions with me is that “intelligence” is mostly a function of us trying out different stories and seeing which “make sense” given the context, what’s known, and so on.




The intelligent design guys make this very argument when asserting that Darwin’s theory cannot explain the spontaneous emergence of new forms. Steven Meyer wrote a paper for The Smithsonian in 2004 on this topic which they published and then retracted.
“… that they were not to content themselves with inquiring whether one system of epicycles was better than another but that they were to sit down to the figures and find out what the curve, in truth, was.”
Interesting. However, none of this would have been possible without Tycho Brahe’s highly accurate data dump from the spanking new Uraniborg observatory. Copernicus was doing new math on increasingly old, corrupt data tables. He had especially messed up the orbit of Mars (which Brahe "hired" Kepler to fix.) As we now know, this is down to the Martian orbit having an orbital eccentricity 5.6+ times that of earth. To figure out what the curve in truth was, you had to figure out what in truth it was actually doing within a certain margin level of resolution/margin of error. And you had to believe what you were seeing even if it went against the scientific consensus of the day. Kepler’s big moment came when he wondered if the reason Mars appears to speed up and slow down is because it actually is speeding up and slowing down. This is of course against the principle of uniformity of heavenly motions inherited from the past consensus.
An intriguing corollary: once the data within that margin of resolution became available, a European astronomer figured it out. This suggests to me that a little bit of historical co-incidence goes a long way and narrowing those ranges of possibility, but once everything needed is in place, we would get there.
Another interesting bit from that time: it was Galileo‘s training as an artist that enabled him to recognize the moon had mountains. He knew chiaroscuro. When he looked through what today would be a very primitive telescope, he nonetheless saw enough to recognize that he was looking at uneven terrain on a three-dimensional surface.