Can Large Language Models Save "Good Old Fashioned AI"?
LLMs haven't replaced traditional AI. They're using it.
Hi everyone,
I want to switch gears here and take up a more technical topic in artificial intelligence. This post has a business/techie feel to it—you’ve been warned! But making sense of LLMs as they’re adopted by companies and institutions seems necessary today if we’re to understand the world, or at least the “tech world.” I hope you enjoy this piece on LLMs and AI, and I hope it helps you understand better what’s happening in “AI,” as it comes to dominate the 21st century so far.
A sincere thank you to paid subscribers. Your contribution encourages me to keep going, and gives me hope that Colligo can become sustainable.
Wanted: Customized LLMs
Large language models (LLMs) like GPT and conversational systems like ChatGPT are, as the world now knows, a seminal technology that in many ways have re-invigorated the field of AI. Language models are actually an old technology in AI, and “large” is a relative word, but the current euphoria about LLMs traces back only to November 2022, when ChatGPT was released by OpenAI. It used an innovation—as I’ve written before—the transformer architecture and the attention mechanism, published back in 2017 by Google Brain and Google Mind scientists. Since late 2022, the ChatGPT system has witnessed exponential growth in its user base. By some estimates, 200 million users signed up for the service in just two months, making it the fastest growing consumer internete app of all time. A revolution? Perhaps. But the story of LLMs is still getting written, and the value of generative models to enterprise and small businesses as well as government, legal, and financial sectors is a moving target. Organizations need custom answers from a LLM. How does that work?
Conversational systems like ChatGPT as well as LLMs are generic and out-of-the-box
"AI.” The answers such systems generate are characteristically impressive, but the same answers are available to anyone in the world. Organizations with proprietary data—like patient medical histories in health care—generally can’t use generic prompts because they’re looking for information on specific patients. The prompts can’t be generic. The direct to customer approach championed by OpenAI doesn’t work.
One option is to “DIY” LLMs, and make a custom one. But training and retraining LLMs is costly, time-consuming—three months of continual training for very large models—and extremely compute intensive, requiring thousands of special purpose CPUs, called GPUs. The “DIY” approach also requires expensive in-house expertise, in the form of data scientists. If you’re Amazon, you can train a proprietary model and you’ve already hired the data scientists. If you’re not Big Tech, training and retraining models simply isn’t an option. Instead, you must prompt engineer.
Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation, or “RAG,” is a fancy term for using an external data source—a database or knowledge base—to customize a prompt sent to an LLM. This is called prompt engineering, and it’s a popular solution to the problem of building large models from scratch. Because generative AI is highly sensitive to the specific terms in a prompt (a “prompt” is simply the words you supply to the LLM, like search terms), changing or adding words typically results in vastly different answers.
LLM users can engineer a prompt by playing around with different wordings:
“show me a movie involving a professional photographer” or
“show me all the movies that had a protagonist taking pictures with a camera” or
“what movies highlight the use of a camera by one of the characters?”
Business and institutions with customers will want to engineer prompts by including customer information, like user profiles and prior purchases for product recommendation. This is retrieval-augmented generation, the process of augmenting a prompt with information retrieved from an external source. For instance, a movie recommendation system might retrieve the profiles of movie viewers and their past movie preferences: “recommend a movie for user-34 given that he likes Die Hard, Blade Runner, and Ex Machina.”
Where does the information about user-34 and any other user come from? From an external data source owned and managed by the organization (“external” means external to the LLM). If the information in the external data source is fully structured (as with a relational database), the retrieval of user profiles and other information is straightforward (using SQL or alternatives). If there are text fields and other unstructured or semistructured information, traditional information extraction techniques and even reasoning and inference can come into play. In other words, LLMs have reshuffled the deck, but we are still playing with the same cards. Knowledge representation and reasoning (KR&R) are “good old fashioned artificial intelligence,” and they’re now used in improving and customizing the output of LLMs, by augmenting LLM prompts with contextual and other information from external sources.
Ontologies Aren’t Dead Yet
Computational ontologies—I thought initially that LLMs would kill them off—might prove useful in the brave new world of LLMs. Ontologies represent words with concepts. An ontology with the concept Cat, for instance, can be used to map lexical instances of “cat” in free text to the structured concept Cat representing a definition of the word and its meaning in relation to other concepts, like Mammal. There’s a many-to-one relationship between lexical entities (words) and concepts, so the ontology serves to simplify, organize, and clarify the meaning of natural language or free text. Very large ontology projects include Wikidata, DBPedia, FOAF, OWL, and many others. The medical community has been active in developing ontologies for use in the domain of medicine. Large efforts include SNOMED CT and the Gene Ontology.
Structured vocabularies and ontologies are used in information retrieval to match terms that are disparate at the lexical level—“truck” and “lorry”—but are essentially the same concept—“large wheeled motorized transportation vehicle.”
Matching terms to concepts is helpful in getting all instances of the concept from sometimes separate repositories. For instance, “give me all the occurrences of Vehicles in our repositories” might yield a large concatenation of vehicles, from trucks to watercraft like submarines. A keyword query might not get them all, and would certainly be cumbersome if all the types of vehicle had to be named. Information retrieval, in other words, can benefit from a conceptualization of a domain, and RAG is no exception. So-called semantic search queries take into account taxonomies of concepts as well as context, where say a search for “football” in the US would mean “soccer” (a geographic context). I call these “good old fashioned AI” techniques because we’ve been developing and using them since the inception of the field. Such techniques are not flexible enough for conversational AI or contextual question answering (why Amazon’s Alexa never worked), but they can improve performance on retrieval for organizations using RAG.
Semantically Searching the World of the Simpsons
(the circles are concepts)
Ontological engineering—building and using ontologies—got adopted by companies like Amazon to organize products and other content in taxonomies, like
Computer —> Information-Technology. When LLMs first appeared, it seemed all this “back-end” concept building work would be unnecessary, since the generative models themselves implicitly represent concepts by giving us the right answers (like humans do). But RAG brought this and other work back into the fray. This is (probably) good news. The field of AI has a superstar in the LLM, but it also must use the other tools developed over decades to make the best use of the superstar. When I wrote The Death of Ontology? a few months ago, I was focused on the impressive performance of LLMs in the absence of structured ontologies or knowledge bases. But the popularity and indeed necessity of RAG pulls all this “old fashioned” work back into play, as the LLM by itself will be of limited usefulness to an organization using external data sources to augment prompts. Generic—not retrieval-augmented—prompts yield answers that anyone with an internet connection can get, nixing any competitive advantage. Natural language processing and information extraction/retrieval lives on? It seems yes.
The upshot? Big companies like Amazon or Google or Facebook train and administer their own LLMs, which simplifies the task of setting up RAG. Amazon has done a lot of work here, building out a RAG infrastructure on its popular Amazon Web Services (AWS) cloud computing platform. But again, companies without deep pockets must turn to pre-trained LLMs like LLama 2 from Meta (Llama 2 is open source) or GPT/ChatGPT. So too for “rich” organizations with proprietary data to protect—using pre-trained models means uploading all that proprietary data to “the cloud.” That simply isn’t doable for many companies and industries, like health care (private patient records) or the military (confidential and classified information) or finance (information on investors and stocks). A plethora of laws and concerns and paranoias prevent it. But without information in the cloud, the LLMs “out there” aren’t usable. There’s more work to do.
Catch-22: It’s expensive to “DIY” a LLM. Organizations without funding for this will turn to pre-trained models like ChatGPT. This requires uploading proprietary data to a cloud service, so the pre-trained model has access to it for use with RAG. Huge industries like health care, government, military, legal and finance simply can’t do this legally. Their only choice is to spend and recruit to host the models on their own servers, and build out RAG from there. This is expensive in itself. Training a new model is much more expensive, and requires expensive expertise (law firms probably don’t want to hire dozens of data scientists and other tech experts). Companies without deep pockets can upload data and use pretrained models, but they will likely have proprietary data they want to protect too. So, spend lots of money to “DIY” it, or cross your fingers with storing the crown jewels in the cloud using platforms like ChatGPT or Llama 2.
RAG and its retrieval tool box will reach many more companies and organizations attempting to resolve the “catch-22.” These companies will use “good old fashioned AI” in various and unpredictable ways. They’ll use existing LLMs—at least initially. What they won’t use—for RAG with proprietary data—is the cloud. Watch the Europeans here, who are more worried (or at least vocal) about privacy and notoriously wary of America’s “Big Tech LLMs.” Software solutions will spread LLM technology far and wide.
Big picture: LLMs are a disruptive technology, but they’re not yet feasible for many companies and organizations. For a number of reasons, even rich companies like OpenAI or Microsoft won’t keep building big generic LLMs (we’re running out of data and GPUs, for one), so we can be sure that existing LLMs will remain popular, and organizations will get their questions answered by using workable and safe RAG. This means software to handle all the security, retrieval and other challenges. Importantly, it means the survival of “good old fashioned AI,” now directed at retrieval-augmented generation. LLMs did not kill off pre-LLM AI—it’s now part of a new LLM-centered workflow. Hooray. But we’re still stuck with LLMs from Big Tech—at least for now. Welcome to the new world, not so different from the old. With tech, at any rate, we can be reasonably assured that more change will come.
Erik J. Larson
So, at some given point, no matter how impressive the LLM, the principle of “GIGO” is always with us... Thanks for a very interesting article: I’ll need to chew it over a bit before the acronyms “stick”, but this stuff is conceptually fascinating (and very current).
Very good. I don't think fine-tuning is entirely out of the picture , there will be hundreds of thousands of them, it will be a combination of pre-training by the OpenAI's (with competition of open source LLama etc), expensive but important fine-tuning, and in-context learning (prompts). Maybe GOFAI could actually be a source for fine-tuning and not only ICL.