Hybrid Systems Aren't a Promising Path to AGI
Plus: Why Kahneman's "System 1 and System 2" is artificial, and the easy and hard problems of intelligence.
Say Hello to my System Oneā¦.
Hi everyone,
My jumping off point for this post is something
wrote last week, an interesting piece, making a good case against Gen AI as a path to AGI, and also making a case for neurosymbolic approaches to AI that marry neural networks with symbolic or rule-based methods. Marcusānot aloneāreferences the late Daniel Kahnemanās 2013 bestseller Thinking, Fast and Slow. In it, Kahneman draws a distinctionāthough he makes clear itās not intended to be an accurate neuroscientific accountābetween a āSystem 1ā thatās fast, reflexive but possibly wrong and a āSystem 2ā thatās slower and more deliberative and sensitive to accuracy and truth:The idea is to try to take the best of two worlds, combining (akin to Kahnemanās System I and System II), neural networks, which are good at kind of quick intuition from familiar examples (a la Kahnemanās System I) with explicit symbolic systems that use formal logic and other reasoning tools (a la Kahnemanās System II).
In this post Iāll explain why I think Kahnemanās distinction is artificial and doesnāt provide a particularly useful framework for our inferences. Marcus seems to take the āSystem 2ā idea as an explicitly computational and (more to the point) deductive type of inference mechanism, the stuff of Classical AI. Ultimately this leads to the idea of hybrid āneuroā and āsymbolicā systems that may go further together toward AGI than separately. Iām skeptical of the āhybridā claim, and Iāll explain why. This is the first of a two part series.
The Reliability Problem and Truthier AI
Marcus is worried about Gen AI and reliability, which is what Iām worried about (and you should be too). It makes sense to consider logic-based approaches to AI when confronted with reliability issues, because unlike neural networks they offer the promise of certainty: in deduction if the premises are true, the conclusion has to be true. Knowing whether your AI is telling it to you straight would be nice. It would certainly address reliability. But there are, as you might expect, some pretty hairy problems wrapped up with all of this. Thereās a reason these approaches are now mostly obsolete in serious work on AI.
To date, all the deductive approaches to achieving AI have roundly failed. Itās a shame, because the theoretical framework for deduction is beautiful, and offers a āroadmapā for establishing the veracity of a systemās conclusions.1 Deduction and its variants were tried for decades, though, and to the tune of hundreds of millions of dollars of (mostly) government R&D, and in the end it allā¦ failed. No one takes seriously anymore the possibility of "scaling upā or using symbolic systems to get to AGI, let alone using only deduction (a small but well-developed subset of āsymbolicā). Younger readers may wonder what everyone was smoking, but rewind a few decades and discover that yes, the field took the symbolic and in particular deductive approach seriously. What happened?
Man Takes Birth Control Pills, Doesnāt Get Pregnant
In a word, relevance. It proved impossible to use deductive approaches in dynamically changing domains requiring commonsense and other types of real-world knowledge like an appreciation of causes and effects. In other words, it proved impossible where inferences are context-dependent and whatās true depends on other relevant considerations and canāt simply be deduced. A deductive system might happily conclude that Joe didnāt get pregnant because he took his wifeās birth control pillsābirth control pills, after all, prevent pregnancy. Itās a silly conclusion because itās entirely irrelevant. Men donāt get pregnant anyway. āRelevanceā is one of these seemingly tractable issues that end up being the thread that when pulled destroys oneās sweater. Engineers grappled with it for decades before essentially walking away from it or at least killing the feverishly hyped announcements.
Relevance-based problems closed any routes to general intelligence, or AGI, for Classic AI, and showed promise only for problems that could avoid the curse of the real world, like the formal verification of computer chip design; still a killer app. The ārelevance problem,ā in retrospect, seems quite obvious, but true believers still seek ways to inject old methods into the newer paradigm (understandably, since hallucinating neural networks set the truth and reliability bar pretty low). What Iāll say presently is this: if anything has been tried exhaustively and in every conceivable combination and permutation in AI, itās the symbolic approach, whose summum bonum is deduction. The problems were never resolved. The approach was simply abandoned.
Marcusās post snapped me out of my slumber, as it were (and to paraphrase Kant), and reminded me that discussion of neurosymbolic systems is, inter alia, a suggestion that we reopen closed cases, and try to resuscitate some part of Classical AI2. I think this is a profound mistake, and Iāll try to explain why. In truth, I might be better served by not explaining why and leaving all this alone, as someone is bound to take umbrage. AI promotes itself as a practical field driven by effectiveness, but deep divisions persist among its various factions. Fervent and dogmatic but essentially pointless disputes simmer in the background; in the past they raged on openly in university labs, think tanks, and (to a lesser extent) corporations. I donāt want to get tangled in that religious-minded mess. Still, to understand what AI is and where itās going, I think we need to go for it. So, let us go for it.
Turn now to the hybrid question, to the question of combining deduction (or symbolic AI) with induction (or āneuroā AI aka machine learning aka empirical AI). For this new discussion, nothing much turns on whether weāre talking about deduction proper or just any symbolic approach.
The Easy Problem of Intelligence (Or: what works in AI is already hybrid)
āNeurosymbolicā is self-explanatory. It involves the fusion of two distinct approaches to AI: neural networks, and symbolic systems. Neurosymbolic systems are, in other words, hybrid systems. The claim that getting to AGI requires building hybrid systems rather than just, say, a super scaled up LLM is just the claim that AI to succeed must use more than one approachāit must be hybrid.
And here I make my first Big Claim: all successful AIāwith rare exceptionāis hybrid.
AI has never succeeded at solving an interesting problem by applying one method to the exclusion of the others, and systems that do cool things, like drive cars or play Jeopardy!, are invariably engineered with all sorts of tricks and algorithms and approaches, from the ever-growing grab bag available to AI practitioners.
Let me give a few examples, and then Iāll explain what I mean by āthe easy problem of intelligence.ā First, Marcus himself makes the point about the ubiquity of hybridization in AI in his book Rebooting AI, with Earnest Davis (itās a good bookāI recommend it). Iāll quote from my own book here:
When the developers of DeepMind claimed, in a much-read article in the prestigious journal Nature, that it had mastered Go āwithout human knowledge,ā they misunderstood the nature of inference, mechanical or otherwise. The article clearly āoverstated the case,ā as Marcus and Davis put it. In fact, DeepMindās scientists engineered into AlphaGo a rich model of the game of Go, and went to the trouble of finding the best algorithms to solve various aspects of the gameāall before the system ever played in a real competition. As Marcus and Davis explain, āthe system relied heavily on things that human researchers had discovered over the last few decades about how to get machines to play games like Go, most notably Monte Carlo Tree Search . . . random sampling from a tree of different game possibilities, which has nothing intrinsic to do with deep learning. DeepMind also (unlike [the Atari system]) built in rules and some other detailed knowledge about the game. The claim that human knowledge wasnāt involved simply wasnāt factually accurate.ā
Itās a powerful critique. Itās also a great example of how AI at the level of systems development invariably becomes hybrid.
An even better example is IBMās development of Watson, the system that played the popular long-running game Jeopardy! Watson beat Ken Jennings, the āGOATā of Jeopardy! (at least in 2011, I donāt follow Jeopardy!). At the time, it seemed fantastical, a bit like when GPT 3 started chat botting us in 2022. I was hooked. When IBM āopen sourcedā the detailed technical papers on how Watson worked, I printed out hundreds of pages and basically read about the system end to end. I found that Watson was a brilliantly designed kluge of clever tricks, using methods from machine learningāwhich today is essentially neural networksāas well as symbolic or classical AI.
Jeopardy! is played by contestants selecting a category, like āStarts with āW'. Watson sported special purpose code modules for analyzing particular types of categories. If the category was āfour letter words,ā Watson would look for particular responses that have four letters, and so on. Modules were organized in a huge pipeline that followed parallel tracks down to a statistical guess about the strongest answer (technically: āquestionā if you know Jeopardy!, as contestants supply the question to the offered answer.) Scores of these one-off code modules narrowed down the possibilities for particular categories and responses. The IBM team discovered that most Jeopardy! questions where titles of Wikipedia pages. Fantastic. A subsystem indexed all of Wikipedia and the titles of pages, and did a quick search early in the pipelineāwhy keep processing when the answer can be retrieved from a hash map?
Of courseāitās 2011āmachine learning was used. Watson debuted just before convolutional neural networks wowed everyone in 2012 at the ImageNet competition. Strangely, then: no neural networks. But ML, yes. Logistic regression was used in (if I recall now) scoring different questions, and Monte Carlo methods were used when deciding whether to ābuzz inā and attempt a response. A powerful system? Yes. But no path to AGI because the IBM team wasnāt even trying to make a general intelligence. This is a hallmark of successful (hybrid) systems. Theyāve been engineered to solve the easy problem of intelligence. Theyāre making no ground against the harder part, which Iāll try to explain later. For now:
(1) The Easy Problem of Intelligence: Find some task/problem that if performed by a human would require intelligence. Engineer the hell out of the problem using any available approach, making a perfectly narrow system capable of performing wondrously (and superhumanly) on the problem.
(2) The Hard Problem of Intelligence: Ignore one-off problems and ask big questions about what would amount to generalityāwhat use cases? Make flexible handling of relevance/context/causality/hypothesis/understanding a precondition of success, and strive to build something that can be applied to lots of different tasks/problems, rather than high visibility but narrow one-offs. (Hint: ChatGPT isnāt it.)
More hybrids. One of my favorites: self-driving cars. [Iāll omit discussion in the interests of cracking on here.] The lesson is: no one algorithm or approach can handle a big complicated problem. Put it another way: no one algorithm or approach can solve the easy problem of intelligence! Ergo, since weāve been engineering hybrid solutions all along, invoking āhybridā doesnāt seem particularly groundbreaking. Something different would seem, well, better.
In other words, AI systems tend to be hybrid because we donāt have any computational theory of general intelligence yet. Weāve got the variegated beasts, the Watsons and so on. To extend my definition a bit, AI has succeeded by zeroing in on problems that (a) have a chance of getting solved by computers, (b) are interesting in either the sense of āwe need thatā (like solving protein folding) or āthat would be cool,ā like playing Jeopardy! or chess or Go or Atari games. The āneedā or ācoolā factor typically means that there will be follow-on funding, stock prices will rise, media and the public will take note, and so on. Generality is engineered out of the hybridized systems. Thatās how they solve the problem weāve selected. It wonāt do to turn around and ask why AI isnāt general. No one has a clue, and big engineered hybrid programming projects are the only stuff that works.
To recap: we first identify a problem thatās interesting, and then we go about engineering the hell out of it, to get a computational solution.3 This is why we donāt see engineers working on a giant Thinking Brain that will solve world hunger, by the way. We see engineers working on something thatās always some combination of doable and useful (or cool). Letās just stop for a moment and think about what I just said. It seems weāre not really making any progress at all toward AGI. We seem to be picking problems and piling up big hybrid solutions. Watson, as we all now know, didnāt do so well when Big Blue ported it over to health care. Thatās not IBMās fault. Itās just, the field. Itās just AI.
Iāve made the point I want to make, but this occurred to me: ChatGPT itself is a hybrid system, and not in a trivial way. After the unsupervised āpre-trainingā (that is: the expensive I need millions of dollars training), the model is fine-tuned, using a dataset of human annotated responses. Thatās supervised learning. The pre-training is unsupervised learning. Reinforcement learning comes next, to further tweak parameters and skew toward desired output (Reinforcement Learning with Human Feedback, RLHF). When you ask ChatGPT a math question or a question requiring math, too, it might farm it out to a Python code snippet. Thatās not a neural network āblack box,ā itās rule execution using traditional programming.
The easy problem therefore sounds a bit tautological: itās using whatever you have thatāll work to solve a problem that can be solved in one way or the other. Yes, there are surprise performances, and ChatGPT certainly qualifies. But itās also another example of solving the Easy Problem of Intelligence. Hybrids. No fundamental insights.
The Hard Problem of Intelligence
The Hard Problem is somewhat tricky to unpack but can be glossed as āand the system has to know something about the problem itās solvingā (see above). This is the problem of actually understanding ā2 + 2 = 4ā versus printing this statement out when asked what ā2 + 2ā equals. AI folks perennially harp on this problem of the āmissing understanding,ā the critics and proponents alike. I harp on it. Letās turn to one reason this pesky understanding requirement wonāt just go away.
Mistakes Are Only Human. And Machine.
It comes down to errorāand error rates. Every cognitive system we know of makes mistakes. We make mistakes. AI makes mistakes. This is such a truism that if a system doesnāt make any mistakes, we typically donāt think itās intelligent (consider: a calculator). Now, what Marcus is getting at with his suggestion that we use some symbolic system or other (and his examples, DeepMindās AlphaProof and AlphaGeometry 2, use theorem provers, or in other words deductive logic) is that the two systems combined may show promise for getting to AGI, where any one approach probably wonāt. Thatās a nice suggestion, but it seems to support the Easy Problem, not the Hard. AlphaProof is not more general because itās hybrid, itās just more effectiveāat solving the identified problem. Thatās the Easy Problem. Progress on the Hard Problem would mean that somehow the hybridization of a system would extend its generality. We donāt see that. Itās not at all clear given what we know about how easy problems go that itās a good hypothesis. We see the DeepMind International Math Olympiad systems perform better than (say) an LLM on a selected problem by doing what AI researchers always do, stare at the problem and engineer some workable solution or other to it. Deduction turns out to be pretty useful for doing math and geometry proofs.
Back to errors. In the Easy Problem, a system is deemed a success, typically, when the errors fall below a certain level; this is minimizing the loss function in machine learning research (think deep neural networks). Note that āerrorā is highly dependent on context: 1 error in 10 might be okay for an image recognition contest but terrifying for a passenger in a self-driving car. Successful systems typically best humans at some task or other, as we see with games like chess and Go. If the problem is really hairy, like engaging in commonsense dialogue with a person (Conversational AI), we might put up with an occasional error, but here the type of error will matter a great deal. If the system occasionally enjoins the human that āAā and ānot-Aā are both true, thatās a teeth shattering error. It means āwhoops, we have no understanding in this system.ā If it gets the name of the mother of the guy who invented floss mixed up with the toothpaste guyās mom, we might not care. (Then again, depending on the situation, we really might.) I think Marcus is spot on in fixing on reliability as the key obstacle with todayās AI. Thinking about errors helps make this even clearer.
To put it another way, LLMs donāt make THAT many errors, but the errors they do make can sometimesāand more often than notābe really boneheaded, stupid, subhuman, and possibly plain dangerous. For some theoretical purposes, LLMs āsolvedā the Conversational AI problem. For practical purposes, the low error rate is hyper-sensitive to context, and so a true evaluation would have a very high bar. The Hard Problem has reared its ugly head.
This takes me to my last points regarding Kahnemanās System 1 and System 2.
Kahnemanās Mistake
Kahneman introduced two intuitive concepts that apply to cognition: we sometimes make hasty but often correct inferences, and we sometimes āthink throughā a problem to get to an answer. The former he called āSystem 1.ā The latter, āSystem 2.ā AI researchers like Marcus (he references Kahnemanās āSystem 1 and 2ā in his post4) seem fond of Kahnemanās mistakeāerr, distinctionāI guess because itās a handy heuristic for thinking about fast but possibly bullshit black box inferences with neural networks, and slower, more deliberate and transparent reasoning typical of the older failed approaches. If only we could combine these, it might extend generalityāand generality is the Holy Grail for AI. But thereās a problem with Kahnemanās distinction, and it unfortunately translates to an irritating blind spot for researchers in AI. This last bit about Kahneman I hope will tie back into the larger discussion about hybrid systems and easy and hard problems of intelligence. Best here if I quote (at length) from my book, The Myth of Artificial Intelligence:
The idea that part of our thinking is driven by hardwired instincts has a long pedigree, and it appears in a modern guise with the work of, for instance, Nobel laureate Daniel Kahneman. In his 2011 best seller, Thinking, Fast and Slow, Kahneman hypothesized that our thinking minds consist of two primary systems, which he labeled Type 1 and Type 2. Type 1 thinking is fast and reflexive, while Type 2 thinking involves more time-consuming and deliberate computations. The perception of a threat, like a man approaching with a knife on a shadowy street, is a case of Type 1 thinking. Reflexive, instinctual thinking takes over in such situations because (presumably) our Type 2 faculties for careful and deliberate reasoning are too slow to save us. We canāt start doing math problemsāwe need a snap judgment to keep us alive. Type 2 thinking involves tasks like adding numbers, or deciding on a wine to pair with dinner for guests. In cases of potential threat, such Type 2 thinking isnāt quickly available, or helpful.
Kahneman argued in Thinking, Fast and Slow that many of our mistakes in thinking stem from allowing Type 1 inferences to infect situations where we should be more mindful, cautious, and questioning thinkers. Type 1 has a way of crowding out Type 2, which often leads us into fallacies and biases.
This is all good and true, as far as it goes. But the distinction between Type 1 and Type 2 systems perpetuates an error made by researchers in AI, that conscious intelligent thinking is a kind of deliberate calculation. In fact, considerations of relevance, the selection problem, and the entire apparatus of knowledge-base inference are implicit in Type 1 and Type 2 thinking. Kahnemanās distinction is artificial. If I spot a man walking toward me on a shadowy street in Chicago, I might quickly infer a threat. But the inference (ostensibly a Type 2 concern) happens so quickly that in language, we typically say we perceive a threat, or make a snap judgment of a threat. We saw the threat, we say. And indeed, it will kick off a fight-or-flight response, as Kahneman noted. But itās not literally true that we perceive a threat without thinking. Perceived threats are quick inferences, to be sure, but theyāre still inferences. Theyāre not just reflexes. (Recall Peirceās azalea.)
Abduction [I donāt discuss abduction in this post but readers familiar with my āinference frameworkā will connect the dots here] again plays a central role in fast thinking: itās Halloween, say, and we understand that the man who approaches wears a costume and brandishes a fake knife. Or itās Frank, the electrician, walking up the street with his tools (which include a knife), in the shadows because of the power outage. These are abductions, but they happen so quickly that we donāt notice that background knowledge comes into play. Our expectations will shape what we believe to be threat or harmless, even when weāre thinking fast. Weāre guessing explanations, in other words, which guides Type 2 thinking, as well. Our brainsāour minds, that isāare inference generators.
In other words, all inference (fast or slow) is noetic, or knowledge-based. Our inferential capabilities are enmeshed somehow in relevant facts and bits of knowledge. The question is: How is all this programmed in a machine? As Levesque points out, some field, like classic AIās knowledge representation and reasoning, seems necessary to make progress toward artificial general intelligence. Currently, we know only this: we need a way to perform abductive inference, which requires vast repositories of commonsense knowledge. We donāt yet know how to imbue machines with such knowledge, and even if we figure this out someday, we wonāt know how to implement an abductive inference engine to make use of all the knowledge in real time, in the real worldānot, that is, without a major conceptual breakthrough in AI.
Whew. So much for Kahneman, at least by my lights. The broader point here is the question about the Hard Problem. Again, nearly every interesting challenge in AI is already a hybrid approach, and what weāve learned isnāt promising: that hybrid systems are good for the Easy Problem, but itās a flat mysteryāand I donāt think particularly promisingāhow combining failed approaches with not-yet-failed approaches (or, more charitably, failed approaches with currently dominant approaches) gets us out of easy problems and into progress on hard ones. Kahnemanās distinction plays into the idea of building out modules for big hybrid systems solving problems by breaking the problems down into smaller partsāthis goes to fast, this here, we better think about more.
I argued in the Myth that we donāt get generality from AI by combining inferences already known to be inadequate individually (induction and deduction). You can read my case for abduction as tantamount to looking for a solution to the Hard Problem by avoiding the hybridization of AI and its allocation of parts of a problem to specific modules or subsystems. Systems have subsystems, yes. But the point is that we canāt divide the subsystems in terms of inference. The fundamental theory weāre in search of would allow us to reason flexibly, from particular observation to particular observation via rules. We can formalize this type of inference as āabductive inferenceā but we donāt really understand how to implement it computationally, or for that matter, using anything at all. Brains do it constantlyāitās like a condition of being awake. So for now anyway, weāre stuck. Weād be better off, in my view, reverse-engineering the brain and looking for a powerful algorithm in the neocortex than continuing to make hybrid systems tailored to specific problems. At the very least, we need some reason to think more hybridization wonāt give us more of the sameābetter performance on more and more easy problems, but no hope of traction on the hard problem5.
I think Iāve run out of steam on this post, so for now Iāll leave it here. I trust anything thatās left dangling or unresolved can be taken up in comments. Thanks everyone!
Erik J. Larson
Iām trying to keep this informal, but Iāll have to bring up some relevant concepts. āProof theoryā is syntactic and is concerned with how a system draws a valid conclusion. āTruth theoryā is semantic, or meaning based, and is concerned with how valid conclusions can be considered true, or in other words how the inference can be sound. These concepts are all part of the study of mathematical logic, and found their way into AI early on, as the pioneers (like McCarthy, Minsky, and others) were mathematicians and realized that logical deduction could (to some extent) be automated.
Deductive systems are technically a subset of āsymbolicā systems. There were plenty of ad hoc symbolic systems approaches, but since they didnāt have a semantics, a way of knowing whatās true and whatās a contradiction and so on, they typically ended in various degrees of spaghetti code. āSpaghetti codeā isnāt a bad rendition of what happened to symbolic systems everywhere, as we kept trying to extend them to cover new knowledge, new cases, and especially to incorporate some notion of relevance. Iāll try to explain this as we go.
Credit for the wonderful phrase āengineering the hell out of it,ā goes to Gerben Wierda. See https://ea.rna.nl/2024/02/07/the-department-of-engineering-the-hell-out-of-ai/.
Kahneman can be read as giving a ārough and readyā distinction for cognition, and a charitable way to take MarcusāI take him this wayāis that heās appealing to the rough and ready idea, with nothing substantive hanging on it. Thatās fine as far as it goes, but I have a specific reason for rejecting Kās distinction; namely, that it tends to obscure the noetic nature of our inferences writ large.
I didnāt tease āgeneralityā and āunderstandingā apart, but itās a bit out of scope, and at any rate they seem to be very closely tied so that one is essentially a precondition of the other. We can haggle, sure. Itās for another post.
A few years from now, I'll be able to proudly say: "I was lucky enough to find Erik J Larson's blog during the AI hype. Yes, *that* Larson, Larson's paradox etc. He was right all along."
What you explain in such a clear way seems so true I sometimes worry there must be some caveat to it :D Seriously though, I still cannot understand why you're still the only one to date I've encountered who talks about abduction (put so nicely in today's piece "the hard problem"). Maybe others purposefully ignore it because it's too hard?
Worthy of being widely read.