Hi everyone,
In this post, I introduce a longstanding thorn in the side of AI scientists working to extend the power of large language models. I haven’t settled on a perfect name for it yet, but we can gloss it as the persistence/adaptability tradeoff: we want models that can fluidly adapt to changing contexts, but we also want them to remember the context we’re in—especially when we return to it—without having to re-prompt or re-explain everything from scratch.
This tradeoff isn’t just an engineering hurdle; it’s a structural consequence of using large-scale inductive data systems. And because I’m increasingly convinced it’s not something we can “scale through” on the way to AGI, I argue that it calls for a different design philosophy altogether—one that makes a distinct epistemological commitment.
I call this approach Augmented Human Intelligence (AHI), and I contrast it with the prevailing AGI paradigm. Have a look.
One of the clearest limitations of today’s large language models is their brittle grasp of context over time. Consider a real-world use case: I regularly translate WhatsApp messages from Spanish to English. From the interaction history, it’s clear that any Spanish input is meant to be translated, not replied to in kind. And yet, even in ChatGPT-4o, the model frequently responds in Spanish—acting as though I were a native speaker carrying on a bilingual conversation. There’s no semantic memory here. The model forgets the implicit rule the moment the token stream ends. And so, we patch the hole.
What techniques exist to fix this? Several, and they’re increasingly common in LLM deployment pipelines:
System prompts (e.g., "Translate Spanish to English"): These are fixed instructions injected into every prompt, meant to guide behavior persistently across the session. But they’re brittle. If I suddenly intend to use Spanish conversationally, the model will still translate it—because the rule overrides the context.
Message tagging and heuristics: Frameworks like LangChain, Semantic Kernel, and CrewAI attempt to impose structure through modular routing and intent classification. Inputs are tagged and sent to specific tools—translators, retrievers, planners. But these systems are fragile.
LangChain, for instance, often fails in edge cases:
Mixed-language inputs (“Translate but preserve names”)
Implicit queries (“Can you help me with this?” without grounding)
Task switching mid-dialogue, where routing logic lags behind user intent.
Semantic Kernel faces similar challenges: its "planner" components depend heavily on tagging quality and break when a user changes direction.
LlamaIndex excels at document context but fails to adapt to fluid user behavior across tasks.
Embedding-based memory systems: These retrieve similar prior messages from a vector database and reinsert them into the prompt. But they don’t understand user goals—they rely on string similarity. The memory might be semantically adjacent but contextually irrelevant.
Session memory (e.g., ChatGPT): OpenAI’s session memory remembers preferences like tone or name, but not functional context. It knows I like formal replies. It doesn’t know I’m in the middle of translating messages from my girlfriend.
All of these are exogenous scaffolds. They surround the model with rules and retrievals but leave the model itself untouched—still stateless, belief-less, and blind to its own trajectory.
The Persistence/Adaptability Tradeoff
If we hardwire context with aggressive routing and memory, we lose flexibility when user behavior changes. If we allow drift in the name of creativity or adaptability, we lose coherence.
No amount of outer logic can fully resolve this tension because the foundation itself—the LLM—is not designed to track or hold contextual meaning across time.
Not Just Software
The problem here is best viewed as more of an epistemological constraint than a software snafu. Intelligence, in humans, entails memory, situated goals, and the ability to model others’ intentions and revise expectations across time. Language models do none of this. They don’t remember what you want, don’t track what they’ve said, and don’t possess any enduring sense of purpose. Their outputs are shaped not by understanding but by statistical proximity to prior examples. When we impose scaffolds—routing logic, memory systems, tool wrappers—we aren’t making the model smarter. We’re making it look as if it understands, for just long enough to get through the next prompt. The problem here is a bit like playing Wack-A-Mole, or to put it another way it’s the problem of fixing desired epistemic contexts and then encountering new ones. The native model might detect the switch, but now the rule designed to patch the native model won’t permit it. No one currently has a solution to the problem, though many are trying.
Sidebar: Who’s Trying to Solve Context Persistence?
Several research projects and platforms are exploring ways to preserve or simulate memory, though none have cracked the core limitation:
MemGPT (Stanford): Simulates human working memory by swapping relevant past content in and out of the model’s context window.
LlamaIndex + Context Agents: Stores external context and reinserts it via retrieval, but often suffers from irrelevant or noisy matches.
Semantic Kernel (Microsoft): Builds planning and memory abstractions, but struggles with dynamic task-switching and ambiguity.
ChatGPT’s Session Memory: Stores tone and surface-level preferences, not task context or evolving goals.
ReAct + Toolformer: Combine tool use with reasoning chains, but are still brittle under shifting instructions or conflicting objectives.
These are all externalized patches around a central fact: LLMs don’t carry meaning forward—they regenerate it, token by token, every time.
Introduction to AHI (not AGI)
Most agentive AI systems today—like LangChain agents or tool-using LLM wrappers—are still built on a quiet but powerful assumption: that the model is the mind. These systems wrap a stateless language model in layers of orchestration—task routing, memory buffers, planning heuristics, tool execution—all designed to simulate continuity and autonomy. The underlying belief is that with enough scaffolding, the model can behave like an intelligent agent.
In these architectures, intelligence is still projected into the model. The system behaves as if the model has goals, memory, and persistence, when in fact, it has none. It merely recomputes responses from statistical correlations, without belief, history, or understanding.
Augmented Human Intelligence (AHI) takes a different stance: the model is not the mind—and never will be. The design goal isn’t to create artificial agents but to build environments that support and extend human cognition. AHI assumes the model is a powerful but shallow reasoning engine, not an autonomous thinker. Its job is not to imitate human minds, but to complement them.
This leads to a different architecture entirely:
The locus of agency remains human, not artificial.
Context is externalized—surfaced, manipulated, and preserved outside the model, where it can be inspected, revised, and controlled.
The system does not simulate belief, memory, or intent. It allows the human to define and track them explicitly, on their terms.
Where agentive AI wraps the model in increasingly brittle logic to fake continuity, AHI builds systems that make context a first-class object—something distinct from the model itself, yet essential to how intelligence unfolds over time.
In this framework, the model becomes what it truly is: a high-speed, high-dimensional subroutine—a reasoning prosthetic—embedded in workflows that are designed for humans who have goals, judgment, memory, and perspective. AHI systems can adapt over time, reflect user-defined goals, and support deep integration without pretending the model “understands” what it’s doing.
In short, AHI doesn’t try to make the machine more human. It makes the human more powerful.
AHI Approach: Context as System Object, Not Prompt Hack
Here’s how it would work with AHI:
On first use, the system notices a repeating pattern: you’re pasting incoming Spanish text and want it translated.
It suggests:
“Would you like to treat pasted messages as translatable input, and your own entries as instructions?”
→ You confirm with one click.That rule becomes a first-class system-level object, not a prompt injection.
It persists across sessions (and is versioned).
You can view, edit, and extend the logic as needed—but you don’t have to re-instruct the system every time.
If you start asking Spanish-language questions, the system flags the ambiguity and offers adaptive options (e.g., switch modes, ask for clarification once, not always).
Crucially:
The LLM isn't pretending to understand your intent.
The system scaffolds it, learns with you, and exposes its assumptions.
Summary: Where the Epistemic Burden Lives
Agentive AI | AHI
Where’s the intelligence? | Projected into the model (via wrappers) | Centered on the user; model is a tool |
Context handling | Heuristics + prompt guessing | Explicit system-level structure |
Adaptability | Requires manual correction or re-routing | Learns with the user, makes context visible and editable |
User role | Constant manager of model confusion | Occasional author of durable, evolving context |
The Promise of AHI
The promise of Augmented Human Intelligence (AHI) isn’t to mimic the mind, but to extend it. That requires systems that can preserve and interact with structured, evolving context—not just generate plausible responses in isolated bursts. AHI begins from a foundational recognition: human intelligence is context-rich, temporally situated, and purpose-driven. Any tool that hopes to augment it must engage with that reality—not abstract it away.
This is where AHI makes a decisive break from the prevailing paradigm. Most “agentive AI” systems today are still trying to wrap statistical models in contextual logic. But AHI rejects that premise entirely. It doesn’t try to simulate autonomy. It builds for cooperation between fundamentally different systems: humans with memory and judgment, and machines with speed and breadth.
Importantly, in AHI context is a manipulable part of the system. Systems are built to externalize memory, intent, and task state so that they are manipulable, inspectable, and persistent—defined by the human, not assumed by the model. The model is used as a tool, not mistaken for a mind.
This is the core design principle: make context manipulable, not implicit. Build environments where humans define and carry forward their own intent—and where systems are built to respect that. We don’t need models that seem human. We need models that help humans think better in time—across tasks, through ambiguity, and amid the real-world messiness that intelligence actually evolved to handle.
Agentive AI tries to guess your intent based on recent patterns—and breaks when you shift midstream.
AHI lets you define the rules of interaction, surfaces them as system objects, and adapts with you—because you're the agent, not the model.
Final Note: A Natural Plateau
In current systems, the model tries to infer your intent from tokens—guessing context from patterns. In AHI, the system doesn’t infer; it prompts. When ambiguity arises, it asks you to resolve it. That resolution becomes part of a durable context contract—an explicit structure the system consults before every interaction.
This shifts the model’s role: from improvisational mind-reader to disciplined transducer. It narrows interpretation when needed, but also interprets disparate inputs as signals to the human: “Context shift detected—should I adjust?” Over time, the system “learns” by accumulating disambiguations—building a persistent, inspectable record of how you want it to behave.
Concrete Example: Spanish Translation, Made Better
Back to the WhatsApp example. In the AHI approach, after detecting repeated Spanish inputs followed by English outputs, the system might ask:
“Would you like me to treat Spanish messages as content to translate, and anything starting with ‘!’ as your own notes or instructions?”
You confirm. That rule becomes a live context contract—one you can inspect, modify, and carry forward.
Now, if you switch gears and type a Spanish-language question, the system doesn’t get confused. It flags the shift:
“This input looks like a Spanish question from you. Should I still translate it or pause the rule?”
In this way, AHI avoids hallucination and misfire not by simulating memory or intent—but by treating them as editable, shareable structures. The model remains stateless, but the system scaffolds a shared epistemic frame—one that evolves with the user instead of pretending to pre-empt them.
Erik J. Larson
This is a well-written essay and it’s clear he’s thought deeply about the topic. But I think he’s falling into a familiar trap: expecting too much from LLMs. He critiques them for lacking memory, persistence, and goal orientation. And he can hardly be blamed.
AI has been overpromised by hype men for more than a decade now. We were told to expect autonomous agents and digital second brains. What we got are extremely good text generators. Larson’s disappointment, I think, stems from that mismatch between the promise and the reality.
If we stop expecting LLMs to carry our minds and treat them as modular reasoning tools, things usually work out surprisingly well.
I’m not sure we need a whole new philosophy. Maybe just clearer boundaries and better-designed interfaces.
One more thought: systems that shift too much cognitive load onto users often fail.
Because the average user won’t (or can’t) maintain rules or tweak context contracts. We’ve seen this in educational tech, UI design, even productivity software.
AHI assumes a level of user initiative that, for most people, just isn’t there. And when systems overestimate their users, they get quietly ignored. That’s the real risk here.
Imagine your grandmother trying to manage persistent context settings across shifting epistemic contracts. Now imagine trying to explain that sentence to her.
Sounds very interesting! Do you have a technical implementation in mind, or is it conceptual at this point? (=would current LLMs play any part in the AHI?)