The Five Chaos Problems in Agent Design

April 6, 2026 · by Jia Huang · Originally on Substack

A product manager at a large tech company told his engineering team: “We need to build an agent.” The lead engineer asked what kind. The PM looked confused. “What do you mean, what kind?”

That confusion is not his fault. It is the field’s fault.

It is 2026, and agent has become the most overloaded word in software since cloud. A ChatGPT wrapper is an agent. A ReAct loop calling three APIs is an agent. A multi-agent system with planning, memory, tool use, and self-reflection is an agent. A glorified if-else chain with an LLM call in the middle is also an agent. They are all agents, which means the word tells you nothing.

This is a design problem, not just a naming problem. Two years of reviewing agent architectures across industries and studying every major framework keeps surfacing the same five chaos problems. The field has not solved them. They are what motivated a structural framework, and a book about it.

1. “Agent” means everything and nothing

When that PM says “build me an agent,” the engineer has no vocabulary to ask the right clarifying questions. Is this a single-turn tool caller? A multi-step planner with memory? An autonomous system that can reflect on its own failures? There is no shared taxonomy. No equivalent of saying “this is a microservice” versus “this is a monolith” versus “this is an event-driven architecture.”

In object-oriented programming, this exact problem existed before the Gang of Four published Design Patterns in 1994. Two engineers could solve the same structural problem, say decoupling a publisher from its subscribers, in completely different ways, with no language to compare their solutions. The book did not invent new programming techniques. It named existing ones. It gave the field a vocabulary: Observer, Strategy, Factory, Decorator.

Agent design in 2026 needs the same treatment. Not new capabilities. New names.

2. Patterns exist as isolated papers, unaware of each other

ReAct is a paper. Reflexion is a paper. Plan-and-Execute is a paper. LATS is a paper. Tree of Thoughts is a paper. Each introduces a pattern, benchmarks it on a few tasks, and moves on.

Nobody has mapped how these patterns relate to each other. Which is a specialization of which? Which ones compose well? Which are mutually exclusive? An engineer who reads the Reflexion paper knows how Reflexion works, but not when to choose it over simple retry logic, or when to combine it with planning, or what happens when it is used in the wrong context.

The individual patterns exist. The map does not.

This is like a toolbox where every tool came with its own instruction manual, but nobody wrote a carpentry guide. You know how to use a chisel. You know how to use a saw. Nobody has told you which one to reach for when, in what order, or what the thing you are building should look like.

3. Cognitive capabilities are tangled with design choices

When someone says “my agent needs memory,” they could mean episodic memory (remembering past interactions), semantic memory (retrieving domain knowledge through RAG), or procedural memory (learning reusable heuristics from past successes and failures). Three fundamentally different cognitive capabilities with different architectures, different costs, and different failure modes, all collapsed into a single checkbox on a feature list.

The same confusion applies to planning, reasoning, perception, and reflection. Engineers treat them as add-ons: “let’s add memory, add planning, add tools.” These are distinct cognitive dimensions. An agent that needs strong perception but minimal planning requires a very different architecture from one that needs deep multi-step reasoning but no memory. Most teams design both the same way, because no framework separates what cognitive capabilities the problem requires from what design pattern implements them.

The result is predictable. Over-engineered agents that are slow and expensive because they have capabilities they don’t need. Under-engineered agents that fail at basic tasks because they are missing a cognitive module nobody thought to include.

4. Framework selection happens by GitHub stars

LangChain. LangGraph. AutoGen. CrewAI. Semantic Kernel. Agency Swarm. Claude Agent SDK. OpenAI Agents SDK. The list grows monthly. Frameworks get picked based on what showed up on social media last week, not on systematic analysis of what the problem actually requires.

This is the NoSQL crisis of 2015 repeating itself. Teams chose MongoDB because it was popular, not because they had thought about whether their data model was document-oriented, graph-oriented, or relational. Years of painful migrations followed when reality collided with hype.

The same pattern is playing out now with agent frameworks. A team adopts a framework, builds for months, then discovers it does not support the coordination model they need, or the memory architecture their use case demands, or the governance layer that production requires. By then, switching costs are enormous.

What is missing is a way to reason about agent architectures from first principles: starting from the problem’s cognitive requirements, not from the framework’s feature list.

5. There is no progression path from simple to complex

Today, the choice is binary: either build a single ReAct agent or jump straight to multi-agent orchestration. No middle ground, no gradual path, no clear criteria for when to escalate.

When should a single agent become two? When should two agents become a supervised team? When does a pipeline need to become an autonomous system? These are among the most consequential architectural decisions in agent design, and they are currently made by gut feeling.

Get the decision wrong in one direction and you under-build: the agent cannot handle the task’s complexity. Get it wrong in the other direction and you over-build: a distributed AI system with all its coordination overhead, failure modes, and debugging nightmares, where a single agent with better prompting would have sufficed.

The root cause

These five problems share a common origin: the field of agent design lacks a structural framework.

Not a new framework: there are too many of those already. Not another paper: there are plenty. A design framework: a shared vocabulary, a map that organizes existing patterns, and a methodology that lets engineers navigate from problem to solution systematically rather than by guesswork.

The framework being built for Designing AI Agents organizes the space along two axes: cognitive capabilities (what the agent needs to think and do) and design patterns (how the agent is structured to deliver those capabilities). Seven cognitive modules. Twenty-seven named patterns. One map.

This newsletter shares the framework as it develops: the thinking behind it, the tradeoffs within it, and the lessons from building agent systems that survive contact with production.

Next week: why “build me an agent” is the most dangerous sentence in software engineering right now, and what to ask instead.

Designing AI Agents (Manning, forthcoming). Subscribe for weekly notes on agent design patterns.

— Jia Huang, AI Researcher, A*STAR Singapore