"Build Me an Agent" — The Most Dangerous Sentence in Software

May 1, 2026 · by Jia Huang · Originally on Substack

Four months building a multi-agent system: a planner, an executor, a critic, a coordinator. Clean codebase. Reasonable architecture. The demo worked on three carefully chosen inputs.

The project began with a single instruction: “we should build an agent for this.” Nobody had asked what kind of agent. Nobody had asked what cognitive capability the task actually required. The team moved straight from agent to multi-agent because that was what the framework tutorials suggested.

Four months and roughly a quarter-million dollars in compute later, the system was stripped back to a single ReAct loop with three tools. Latency dropped from 45 seconds to 6. Cost per query down 90%.

It was not a bad build. It was a build for the wrong problem.

This pattern recurs. The names change. The numbers change. The framework changes. The shape repeats.

The six things “build me an agent” could mean

When someone says “build me an agent,” they might mean any of these, and they probably have not thought about which:

A smart API call. The task has a fixed input, a fixed output, and the LLM parses or generates text in the middle. This is not an agent. It is a function call with a language model inside. No tools, no memory, no loops. The right asset is a prompt, not an architecture.

A single-turn tool caller. The LLM receives a query, decides which tool to call, calls it, and returns the result. One decision, one action. Think: “look up this customer’s order status.” The entire “agent” is an if-else that the model handles instead of your code.

A ReAct loop. The model reasons, acts, observes the result, reasons again, repeats until done. This is the first thing that actually deserves the word agent. It has a feedback loop. Most tasks called “agentic” belong here, and most of them need nothing more complex.

A planner-executor. The model first creates a multi-step plan, then executes each step. This adds a planning dimension that a simple ReAct loop does not have. Required when the task has dependencies between steps, i.e. when step 3 depends on the output of step 1 and the model needs to know that before it starts.

A reflective agent. The model executes, then evaluates its own output, then revises. Adds a self-critique layer. Required when the task has quality standards that cannot be verified by a simple tool call, i.e. when the agent needs to ask itself “is this actually good?”

A multi-agent system. Multiple models with different roles coordinate on a shared task. Required when the task genuinely demands different expertise or perspectives that cannot fit in a single context window. Almost nothing does. This is the most over-deployed pattern in the field.

Six very different architectures. The same five-word sentence could mean any of them. The cost difference between picking right and picking wrong is not incremental. It is often 10× in compute, 5× in latency, and months in development time.

The real question is not “what can the agent do?” It is “what does the task need?”

Most agent projects start from the technology: “We have this framework, what can we build with it?” That is backwards. It is like choosing a database before understanding the data model.

The right starting point is the task. Specifically, three questions:

How many cognitive steps does the task require? If the answer is one (parse this, classify that, extract this), an agent is not needed. A prompt is. If the answer is “it depends on what happens at each step,” a loop is needed. If the answer is “multiple steps with dependencies,” planning is needed.

Does the task require the system to evaluate its own output? If yes, the system needs reflection. If not, reflection only adds latency and cost. In one project, a critic agent was layered on top of a deterministic validator that already produced the same check. The critic added latency, occasionally hallucinated problems that did not exist, and contributed nothing the validator did not already provide.

Does the task require multiple distinct perspectives or knowledge bases that cannot fit in one context? If genuinely yes (not “it would be nice”), multi-agent is on the table. If no, a single agent with good tools will outperform a multi-agent system every time. Coordination overhead is real, and most teams underestimate it.

These three questions absorb roughly 80% of the architectural mistakes that show up in agent projects.

Design inversion

Agent systems invert a long-standing relationship between specification and implementation.

In traditional software, vague specs are recoverable mistakes. The code is deterministic: if it works once, it works every time. Sloppy specifications can be refactored, retrofitted with tests, salvaged through iteration. The implementation is forgiving. A decade of enterprise architecture work, SAP transformations where requirements arrive as PDFs and code is written against them, relied quietly on this property.

Agent systems do not have it. Model behavior is non-deterministic. The same input may produce different outputs. A vague specification (what the agent should do, what it should not, what tools it has, what done looks like) does not get patched by good implementation. It produces a system that demos well and fails unpredictably in production.

This is the design inversion: in traditional software, implementation is 80% of the work; in agent systems, specification is 80% of the work. Getting the prompt right, defining tool boundaries, constraining the action space, specifying success criteria. That is where the real engineering happens. The code connecting it all is often trivially simple.

The teams that struggle most with agents are often the ones with the strongest traditional engineering instincts. The instinct to write code, build abstractions, design class hierarchies, exactly the instincts an agent does not reward. The agent cares about its prompt, its tools, its constraints. Engineering effort spent elsewhere is engineering effort wasted.

Five questions that absorb the conversation

Five questions, asked early enough, absorb most of the rework that hits later:

What is the task, exactly? Not “customer support.” That is a domain. What is the specific action the system performs, from what input to what output?
How many cognitive steps does it take? One? A variable number? Does the number depend on intermediate results?
What tools does it need? Listed in full. Lists longer than seven typically signal a bundled scope that should be unbundled.
What does succeeded look like? Verifiable programmatically, or only by human judgment?
What must it never do? The boundary defines the agent more than the goal does.

When these five answers are clear, the architecture tends to choose itself. When they are not, the project is not ready to build. It is ready to specify.

The most dangerous sentence in agent engineering is not “build me an agent.” It is “sure, let me start coding.”

Next Tuesday: why context, not compute, is the bottleneck every agent team hits, and the premise underneath the framework being published this May. Subscribe.

Designing AI Agents (Manning). The Chinese edition shipped 9,000+ copies in its first two months and is now in its 4th print run.