More Agents, More Problems
Why multiagent systems are just distributed systems wearing a trench coat, and how to architect them so they survive past the demo.

🗓️ Last updated: June 2026
Multiagent systems get pitched as the next leap in agentic capability. Single agent struggling with a long task? Add a planner. Reasoning is shallow? Add a critic. Output is noisy? Add a verifier. The mental model is that intelligence is additive: more agents, more capability. It's a tidy theory. Production has other ideas.
In production, intelligence isn't additive. Coordination overhead is. Every agent you add multiplies the cost decomposition we built in the last article: same tokens, same calls, but now spread across a larger graph of communicating peers. Every agent you add also multiplies the failure modes: not just "the model hallucinated," but "agent B trusted agent A's hallucination and acted on it."
This is the sixth and final article in Architecting Agents. Five articles built the toolkit: memory, idempotency, observability, cost control. This one is about what happens when you take that toolkit and apply it to a system that has more than one mind in it.
The short version: multiagent systems are distributed systems. The patterns you need are the patterns we already know: orchestration, choreography, leader election, message contracts, termination guarantees. The novelty isn't in the patterns. The novelty is in remembering to use them.
The Case Against "Just Add More Agents"
The standard pitch for multiagent: split a complex task across specialized agents (researcher, writer, critic, executor) and let them coordinate. The hidden cost, and the reason most multiagent demos don't survive contact with production, is that coordination is not free.
Three costs compound. They're at their worst in the most naive shape, every agent talking to every other agent, in series, which happens to be the shape most demos ship. Read this section as the bill you pay if you don't impose structure. The patterns later in this article exist to bring each cost back down.
Token cost. In a fully connected topology, N agents talking gives you on the order of N² message paths and O(N²) token growth per round, because every agent to agent message becomes context for the next call. Route everything through a single coordinator instead and that collapses to O(N). Hold that thought.
Latency cost. Agents in a coordination loop wait on each other. A five agent serial chain stacks those latencies end to end, before retries are even considered. Dispatch workers in parallel and you stop paying the sum and start paying the max.
Failure cost. Each agent is a probabilistic component. Five agents at 95% individual reliability, chained so all five must succeed with no retries, give you a system at roughly 77% end to end (0.95^5), which is a polite way of saying it fails about one run in four. That number is the pessimistic floor: independent, retryable workers claw most of it back, which is rather the point of the patterns below.
So before you reach for a second agent, ask the question the systems design world has been asking for decades: can this be done with a single component plus a tool? A single agent with three well defined tools is almost always cheaper, faster, and more reliable than three agents with overlapping mandates.
Coordination Is a Distributed Systems Problem
Once you've decided multiagent is genuinely warranted, you inherit the entire catalogue of distributed systems problems:
Consensus: who has the final word when agents disagree?
Leader election: when does coordination route through a single point, and how do you handle that point failing?
Liveness vs safety: does the system always make progress, or can it deadlock waiting for another agent?
Termination: does the conversation actually end, or do you have a budget burning infinite loop?
Message ordering: does agent B always see agent A's message before acting?
Partition tolerance: what happens when one agent times out or returns malformed output?
None of these are new. All of them have known solutions from forty years of distributed systems literature: Lamport on clocks and consensus, the saga pattern from Garcia Molina and Salem, the choreography vs orchestration debate that's lived inside microservices for a decade. The mistake teams make is treating multiagent as an LLM prompting problem rather than a coordination problem.
Orchestration vs Choreography
This is the most fundamental architectural decision in any multiagent system, and it's borrowed directly from the microservices playbook.
Orchestration: A central coordinator (often itself an agent, sometimes deterministic code) directs which agent runs when, with what inputs. Workers don't know about each other.
Choreography: Agents react to events on a shared bus. No central coordinator. Each agent decides when it should act based on what's on the wire.
Orchestration is easier to reason about, easier to debug, easier to bound. The coordinator is a single place where you can enforce termination, budget, and ordering. The trade off is that the coordinator becomes a single point of failure and a potential bottleneck.
Choreography is more flexible and scales better in principle, but it's far harder to bound. With no central authority, you need every agent to share a consistent understanding of "what's done," and you need a way to stop the whole thing.
Practical default: Start with orchestration. Move to choreography only when you have a concrete bottleneck at the coordinator, and only once you've built enough observability that debugging a decentralized flow isn't pure guesswork.
The Orchestrator Worker Pattern
The most common, most defensible multiagent shape. A single planner orchestrator decomposes the task, dispatches subtasks to worker agents, and assembles the final answer.
Key properties:
Workers are stateless and short lived. Each invocation is a single tool call boundary. No long running worker conversations.
Workers don't talk to each other. All communication between workers goes through the orchestrator. This kills the N squared message problem (the thought you were holding earlier).
The orchestrator owns termination. Workers can fail or time out; the orchestrator decides whether to retry, fall back, or stop.
The orchestrator owns memory. Workers see only the slice of context relevant to their subtask.
This pattern maps cleanly onto a microservices architecture: the orchestrator is the API gateway, the workers are stateless services, the LLM is the policy engine inside each.
Antipattern to watch for: an orchestrator that's "just another agent in the chat." If the orchestrator is participating in the same freeform conversation as the workers, you've lost the central control property that made orchestration worth choosing, and reinvented a group chat with a token budget.
The Planner Executor Split
A variant of orchestrator worker that's worth calling out separately, because it has shown up explicitly in published frameworks (LangGraph, AutoGen, OpenAI's Swarm patterns). It splits cognition from action:
Planner agent: Reads the task, produces a structured plan (typically a DAG of steps). Has no tools. High capability model, possibly with extended reasoning.
Executor agent (or executor loop): Walks the plan, calls tools, handles per step errors. A lower capability, cheaper model is often enough.
Why this works: planning and execution sit on very different cost and quality curves. Planning benefits from the smartest available model and runs once per task. Execution runs many times per task and benefits from being fast and cheap.
This split also makes the system observable in a way a single agent loop isn't. You can inspect the plan before any tool fires. You can replay execution against the same plan. You can A/B test planners independently of executors. That ties directly back to the observability article: the plan becomes a structured artifact you can attribute outcomes to.
Message Contracts, Not Free Text
The single biggest reliability win in multiagent systems is also the most underused: agents talk in schemas, not in prose.
If agent A produces free form text and agent B parses it, you've introduced two failure modes, A's output drift and B's parsing brittleness, at every coordination boundary. The cost decomposition from the previous article gets worse fast: more tokens spent on inter agent prose, more retries when parsing fails, more drift over a long session.
The fix is the same one services have used for decades: define the contract. A few rules that consistently pay off:
Structured outputs at every agent boundary. JSON schemas, Pydantic models, function call signatures. Validate on receive; fail fast on schema violation.
No "messages" between agents, only typed artifacts. A worker returns a
SearchResult, not "Hey, I found these papers..."Versioned schemas. When you change a contract, every agent that depends on it needs to be updated together, the same way you'd handle a breaking API change.
One field for confidence or status. Workers should be able to signal "I tried but failed" without the orchestrator having to infer it from prose.
The orchestrator's job becomes routing typed artifacts between deterministic boundaries. That's a much smaller and more debuggable problem than "broker a conversation between three LLMs."
Termination, Budgets, and Loop Cutoffs
The classic multiagent failure mode: two agents in a polite loop, each waiting for the other to confirm, neither allowed to stop. Or a critic and actor pair where the critic always finds something to revise, so the actor revises forever.
Every multiagent system needs at least three termination conditions, enforced by the orchestrator:
Success oracle. A structured signal that the task is done. Not "the last agent said it was done," but a verifiable condition (schema valid answer, test passes, document conforms to spec).
Turn cap. A hard limit on the number of orchestration rounds. When hit, the system returns the best so far answer with a flag indicating it was bounded out.
Cost cap. The budget mechanism from the previous article, applied to the whole multiagent session, not per agent. One agent in a loop can blow the entire budget if the cap is per agent.
The success oracle is the one teams skip most often. Without it, you're relying on the agents themselves to decide when they're done, which is exactly the kind of call LLMs are worst at, especially under pressure to "be helpful."
The honest caveat: sometimes a clean oracle doesn't exist. Open ended generative tasks (write the essay, draft the strategy, summarize the quarter) have no schema to validate and no test to pass. When you genuinely can't build a success oracle, don't pretend you have one. Use a critic graded against an explicit rubric as your "good enough" signal, and let the turn cap and cost cap be the real backstops. A bounded best effort answer beats an unbounded hunt for a "done" state the agents will cheerfully never reach.
Critic and Verifier Patterns
When you do want a second agent in the loop, the most defensible role for it is verification, not generation.
The pattern: agent A produces an answer, agent B (the critic or verifier) checks it against a structured rubric, and the orchestrator decides whether to accept, revise, or escalate.
Why this works better than two generators:
Verification is a narrower, easier problem than generation. The critic can be a cheaper model.
The critic's output is structured (pass/fail per criterion), not prose. That makes its decisions auditable.
The orchestrator can short circuit: if the critic passes on the first try, no revision loop runs.
Failure modes to watch:
Critic capture. The actor learns to write in a way that makes the critic happy, not in a way that's correct. Mitigate by rotating critic prompts or models.
Infinite revision. The critic always finds something. Cap the revision loop and require the critic to grade improvements (otherwise revisions can be worse than originals).
Critic as generator drift. A critic that starts suggesting how to fix issues quickly becomes a coauthor. Keep the critic's contract pass/fail plus reason, never rewrite.
Shared State and the Blackboard Antipattern
The temptation: give every agent read and write access to a shared scratchpad. They can see each other's work, build on it, coordinate organically. The pattern has a respectable history: it goes back to the Hearsay II speech understanding system in the 1970s. It also ages poorly under modern token economics.
In practice this is the multiagent equivalent of a global variable, and we all remember how beloved those are. Every agent's behaviour depends on every other agent's writes. Debugging becomes archaeology: who set this field, when, and why? You also pay an enormous token tax: every agent reads the entire blackboard on every turn, so context grows roughly with conversation length times agent count.
When shared state is genuinely needed (and it sometimes is), enforce three constraints:
Schema typed. No free text blackboard. Every entry is a typed record with provenance (which agent, when, why).
Append only. Agents add records; they don't overwrite. This preserves history and makes the system debuggable.
Scoped reads. Agents query the state for what they need; they don't dump the whole board into context.
Even with these guardrails, prefer message passing through the orchestrator. The blackboard pattern is rarely the right default; the cases where it earns its keep are narrow.
Where Each Pattern Earns Its Keep
A short decision guide, since theory only gets you so far:
One model, several tools, bounded task: single agent plus tools (don't go multiagent).
Long task, decomposable into independent subtasks: orchestrator worker.
Long task, plan then execute structure: planner executor.
Quality critical generation where verifiability matters: generator plus critic.
Many specialists, loosely coupled tasks, async is acceptable: choreography (event driven).
Real time coordination, shared world model: blackboard, sparingly.
The most common mistake is jumping to choreography or blackboard because they sound more sophisticated. They aren't. They're just harder to bound. Start with the simplest pattern that fits, and escalate only when you have evidence the simpler pattern doesn't.
The Architect's Mental Model
Six articles in, the pattern is consistent. Agents are probabilistic distributed components. The discipline that makes them production grade, memory, idempotency, observability, cost bounds, coordination, is the same discipline that has made distributed systems production grade for decades.
Multiagent is where that lineage becomes obvious. The minute you have more than one agent, you have a distributed system. Either you design it like one, with explicit contracts, bounded loops, observable flows, and single points of authority, or you ship a system that works in demos and falls over the first time the message bus carries something the agents didn't expect.
The series has built a stack:
Memory gives an agent continuity across turns.
Idempotency and sagas make its tool calls safe under retry.
Observability makes its behaviour legible.
Cost bounds keep its consumption predictable.
Coordination patterns scale it across multiple agents without losing any of the above.
The teams that internalize this get past the demo and into production. The teams that don't keep shipping fragile cathedrals on top of probabilistic foundations.
Closing the Series
If you've read all six, the through line is hopefully clear: agentic AI is a software architecture problem dressed up in new vocabulary. Treat it that way and the rest follows.
The toolkit isn't novel. It's borrowed, deliberately:
The transactional discipline of databases
The retry semantics of message queues
The observability practices of microservices
The cost discipline of cloud native systems
The coordination patterns of distributed computing
Applied to a new substrate, language models, by engineers who took the substrate seriously enough to build it properly.
Architecting an agent is architecting a system. Adding more agents doesn't change the rules. It just makes them stricter.
What's Next
This is the final article in Architecting Agents, the series on bringing distributed systems discipline to agentic AI design.
The series ends here, but the discipline doesn't. If a future installment makes sense, it will be on the engineering practices around the toolkit (eval pipelines, prompt versioning, model migration) rather than the toolkit itself.
If this resonated, the back catalogue:





