TutorialApril 7, 20269 min read

How to Build a Multi-Agent AI System: A Step-by-Step Guide

A practical planner, executor, reviewer architecture in Python, with enough structure to survive contact with production.

multi-agent ai system tutorialbuild ai agentsai agent architectureplanner executor reviewer

A useful multi-agent system is not just three prompts talking to each other. It is a controlled loop with roles, typed state, stop conditions, and at least one review stage that can reject weak intermediate output. If you skip those pieces, the result usually looks impressive in a demo and collapses as soon as the inputs become noisy.

This tutorial walks through the smallest architecture I recommend for real work: a planner agent that decomposes the request, an executor agent that performs the work, and a reviewer agent that checks quality before anything leaves the system. You can later map the same design onto LangChain, CrewAI, or AutoGen, but the cleanest way to learn it is first in plain Python.

01/Start with the contract, not the prompts

Before you write an agent class, define what moves through the system. A planner should not return a poetic essay when the executor expects a numbered list. A reviewer should not be allowed to invent a new task while pretending to approve the old one. Most multi-agent failures are really contract failures.

  • Planner responsibility: convert a goal into a small ordered plan.
  • Executor responsibility: produce the work artifact from that plan.
  • Reviewer responsibility: score the result, suggest fixes, and either approve or send the work back for another pass.
  • Controller responsibility: keep the shared state, enforce iteration limits, and decide whether the loop continues.

Treat the controller as the source of truth. Agents generate proposals. The controller mutates state. That distinction prevents a common anti-pattern where every agent silently edits the shared context in incompatible ways.

02/Step 1: define the shared state

A lightweight dataclass is enough for a first version. It forces you to decide what the system remembers across steps and what counts as completion.

Shared state for a 3-agent loop
from dataclasses import dataclass, field
from typing import List


@dataclass
class RunState:
    goal: str
    plan: List[str] = field(default_factory=list)
    draft: str = ""
    review_notes: List[str] = field(default_factory=list)
    approved: bool = False
    iteration: int = 0
    max_iterations: int = 2


def require_non_empty(value: str, field_name: str) -> str:
    value = value.strip()
    if not value:
        raise ValueError(f"{field_name} cannot be empty")
    return value

Keep the state brutally small at the beginning. Add fields only when you can explain why the controller needs them. Large shared state turns into prompt soup quickly, and prompt soup is how agent systems become expensive and unpredictable.

03/Step 2: implement planner, executor, and reviewer

Each agent should do one thing well and return structured output. In the example below, the `call_model` function is a placeholder for your preferred model client. The important part is the role boundary and the parsing discipline.

Three single-purpose agents
import json


def call_model(system_prompt: str, user_prompt: str) -> str:
    # Replace with your LLM client of choice.
    raise NotImplementedError


class PlannerAgent:
    system_prompt = (
        "You are a planner. Return valid JSON with a 'steps' array of 3 concise steps."
    )

    def run(self, goal: str) -> list[str]:
        raw = call_model(self.system_prompt, goal)
        parsed = json.loads(raw)
        steps = [require_non_empty(step, "plan step") for step in parsed["steps"]]
        return steps[:3]


class ExecutorAgent:
    system_prompt = (
        "You are an executor. Use the plan to produce the requested deliverable."
    )

    def run(self, goal: str, plan: list[str], review_notes: list[str]) -> str:
        prompt = f"Goal: {goal}\nPlan: {plan}\nReview notes: {review_notes}"
        return require_non_empty(call_model(self.system_prompt, prompt), "draft")


class ReviewerAgent:
    system_prompt = (
        "You are a reviewer. Return valid JSON with keys: approved (bool) and notes (array)."
    )

    def run(self, goal: str, draft: str) -> tuple[bool, list[str]]:
        prompt = f"Goal: {goal}\nDraft: {draft}"
        raw = call_model(self.system_prompt, prompt)
        parsed = json.loads(raw)
        approved = bool(parsed["approved"])
        notes = [note.strip() for note in parsed["notes"] if note.strip()]
        return approved, notes

Notice what is missing: the planner is not allowed to write the final answer, the executor is not allowed to approve itself, and the reviewer is not allowed to redefine the goal. When each role stays narrow, your traces become understandable and your failures become debuggable.

Want framework-specific implementations too?

The OrchestreIA bundle expands this pattern into reusable blueprints for LangChain, CrewAI, AutoGen, evaluation, and production deployment.

Get the AI orchestration bundle - $29

04/Step 3: add the controller loop

Now the orchestration layer becomes simple. The controller initializes state, runs planning once, then iterates between execution and review until the reviewer approves or the iteration limit is reached.

Controller loop with bounded retries
def run_multi_agent_system(goal: str) -> RunState:
    state = RunState(goal=require_non_empty(goal, "goal"))
    planner = PlannerAgent()
    executor = ExecutorAgent()
    reviewer = ReviewerAgent()

    state.plan = planner.run(state.goal)

    while state.iteration < state.max_iterations and not state.approved:
        state.iteration += 1
        state.draft = executor.run(state.goal, state.plan, state.review_notes)
        state.approved, state.review_notes = reviewer.run(state.goal, state.draft)

    return state


result = run_multi_agent_system(
    "Create a migration checklist for moving a support workflow to AI assistance."
)

if not result.approved:
    raise RuntimeError(
        f"System stopped after {result.iteration} rounds without approval: {result.review_notes}"
    )

print(result.draft)

This is the moment where many teams overcomplicate things. You do not need a swarm, a memory subsystem, and dynamic delegation on day one. You need a loop that is observable and bounded. Get that right first. Advanced orchestration becomes much easier once the basic contract is sound.

Good default

One planning pass, one execution pass, one review pass, and at most one rework pass. That is enough to catch many quality issues without doubling latency forever.

05/Step 4: make it production worthy

The jump from tutorial to production is mostly about control, not intelligence. A planner-executor-reviewer loop becomes reliable when you add logging, validation, and cost discipline around it.

  • Log every agent input and output with a run id. Without traces, you cannot debug bad runs.
  • Validate structured outputs before they touch shared state. Invalid JSON should trigger a retry, not a silent failure.
  • Cap token budgets per stage. Planning should be cheap; reviewing should be cheaper than executing.
  • Persist intermediate state if the workflow spans queues, cron jobs, or human approval steps.
  • Write eval cases for the reviewer. If the reviewer approves obvious garbage, your loop is decorative, not protective.

You should also decide what happens after repeated failure. In many real systems, the right answer is not another loop. It is escalation to a human, a fallback template, or a narrower task. Multi-agent systems are not valuable because they keep trying forever. They are valuable because they fail in a controlled way.

06/Where this architecture goes next

Once this baseline works, the next upgrades are obvious. You can parallelize the executor across plan steps, add a retrieval stage before execution, or replace the reviewer with a pair of reviewers for stricter approval. But all of those are extensions of the same core idea: specialized agents plus a controller that owns state and stop conditions.

That is why this tutorial matters even if you later move into LangChain, CrewAI, or AutoGen. Framework APIs change. The planner-executor-reviewer pattern does not. Learn the shape of the system first, then pick the framework that best matches your deployment constraints.

OrchestreIA Bundle

Ready to move from tutorial code to reusable production patterns?

Get the bundle for orchestration templates, guardrail checklists, and architecture patterns you can apply across projects.

Free Resource

Get the Free Cheat Sheet

Download the bilingual AI orchestration cheat sheet with core patterns, framework comparisons, and the production checklist from this article.