Artificial intelligence

How to Design a Production-Grade CAMEL Multi-Agent Production System with Programming, Tooling, Self-Adaptation, and Critique-Driven Development

In this tutorial, we use an advanced AI system using i CAMEL framework, organizing many specialized agents to jointly solve a complex task. We design a structured multi-agent pipeline that includes an editor, researcher, writer, critic, and rewriter, each with clearly defined responsibilities and outcomes limited by the schema. We combine tooling, adaptive sampling, systematic validation with Pydantic, and iterative critique-driven refinement to create a robust, research-backed technology generator. Through this process, we show how modern agent architectures integrate planning, reasoning, external tool collaboration, and independent quality control within one coherent workflow.

import os, sys, re, json, subprocess
from typing import List, Dict, Any, Optional, Tuple


def _pip_install(pkgs: List[str]):
   subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "-U"] + pkgs)


_pip_install(["camel-ai[web_tools]~=0.2", "pydantic>=2.7", "rich>=13.7"])


from pydantic import BaseModel, Field
from rich.console import Console
from rich.panel import Panel
from rich.table import Table


console = Console()


def _get_colab_secret(name: str) -> Optional[str]:
   try:
       from google.colab import userdata
       v = userdata.get(name)
       return v if v else None
   except Exception:
       return None


def ensure_openai_key():
   if os.getenv("OPENAI_API_KEY"):
       return
   v = _get_colab_secret("OPENAI_API_KEY")
   if v:
       os.environ["OPENAI_API_KEY"] = v
       return
   try:
       from getpass import getpass
       k = getpass("Enter OPENAI_API_KEY (input hidden): ").strip()
       if k:
           os.environ["OPENAI_API_KEY"] = k
   except Exception:
       pass


ensure_openai_key()
if not os.getenv("OPENAI_API_KEY"):
   raise RuntimeError("OPENAI_API_KEY is not set. Add it via Colab Secrets (OPENAI_API_KEY) or paste it when prompted.")

We set up the workspace and install all the necessary dependencies directly within Colab. We securely configure the OpenAI API key using Colab secrets or manual input. We also implement console utilities that allow us to provide cleanly programmed output at runtime.

from camel.models import ModelFactory
from camel.types import ModelPlatformType, ModelType
from camel.agents import ChatAgent
from camel.toolkits import SearchToolkit


def make_model(temperature: float = 0.2):
   return ModelFactory.create(
       model_platform=ModelPlatformType.OPENAI,
       model_type=ModelType.GPT_4O,
       model_config_dict={"temperature": float(temperature)},
   )


def strip_code_fences(s: str) -> str:
   s = s.strip()
   s = re.sub(r"^```(?:json)?s*", "", s, flags=re.IGNORECASE)
   s = re.sub(r"s*```$", "", s)
   return s.strip()


def extract_first_json_object(s: str) -> str:
   s2 = strip_code_fences(s)
   start = None
   stack = []
   for i, ch in enumerate(s2):
       if ch == "{":
           if start is None:
               start = i
           stack.append("{")
       elif ch == "}":
           if stack:
               stack.pop()
               if not stack and start is not None:
                   return s2[start:i+1]
   m = re.search(r"{[sS]*}", s2)
   if m:
       return m.group(0)
   return s2

We introduce the main components of CAMEL and describe the model industry used for all agents. We use utilities to clean and reliably extract JSON from LLM responses. This ensures that our multi-agent pipeline remains structurally robust even when the models return formatted text.

class PlanTask(BaseModel):
   id: str = Field(..., min_length=1)
   title: str = Field(..., min_length=1)
   objective: str = Field(..., min_length=1)
   deliverable: str = Field(..., min_length=1)
   tool_hints: List[str] = Field(default_factory=list)
   risks: List[str] = Field(default_factory=list)


class Plan(BaseModel):
   goal: str
   assumptions: List[str] = Field(default_factory=list)
   tasks: List[PlanTask]
   success_criteria: List[str] = Field(default_factory=list)


class EvidenceItem(BaseModel):
   query: str
   notes: str
   key_points: List[str] = Field(default_factory=list)


class Critique(BaseModel):
   score_0_to_10: float = Field(..., ge=0, le=10)
   strengths: List[str] = Field(default_factory=list)
   issues: List[str] = Field(default_factory=list)
   fix_plan: List[str] = Field(default_factory=list)


class RunConfig(BaseModel):
   goal: str
   max_tasks: int = 5
   max_searches_per_task: int = 2
   max_revision_rounds: int = 1
   self_consistency_samples: int = 2


DEFAULT_GOAL = "Create a concise, evidence-backed technical brief explaining CAMEL (the multi-agent framework), its core abstractions, and a practical recipe to build a tool-using multi-agent pipeline (planner/researcher/writer/critic) with safeguards."


cfg = RunConfig(goal=DEFAULT_GOAL)


search_tool = SearchToolkit().search_duckduckgo

We define all structured schemas using Pydantic for planning, proofing, debugging, and runtime configuration. We formalize the agent’s communication protocol so that all steps are verified and documented. This allows us to convert the free LLM results into predictable, production-ready data structures.

planner_system = (
   "You are a senior agent architect. Produce a compact, high-leverage plan for achieving the goal.n"
   "Return ONLY valid JSON that matches this schema:n"
   "{{"goal": "...", "assumptions": ["..."], "tasks": "
   "[{{"id": "T1", "title": "...", "objective": "...", "deliverable": "...", "
   ""tool_hints": ["..."], "risks": ["..."]}}], "
   ""success_criteria": ["..."]}}n"
   "Constraints: tasks length <= {max_tasks}. Each task should be executable with web search + reasoning."
).format(max_tasks=cfg.max_tasks)


planner = ChatAgent(system_message=planner_system, model=make_model(0.1))


researcher = ChatAgent(
   system_message=(
       "You are a meticulous research agent. Use the web search tool when useful.n"
       "You must:n"
       "- Search for authoritative sources (docs, official repos) first.n"
       "- Write notes that are directly relevant to the task objective.n"
       "- Return ONLY valid JSON:n"
       "{"query": "...", "notes": "...", "key_points": ["..."]}n"
       "Do not include markdown code fences."
   ),
   model=make_model(0.2),
   tools=[search_tool],
)


writer = ChatAgent(
   system_message=(
       "You are a technical writer agent. You will be given a goal, a plan, and evidence notes.n"
       "Write a deliverable that is clear, actionable, and concise.n"
       "Include:n"
       "- A crisp overviewn"
       "- Key abstractions and how they connectn"
       "- A practical implementation recipen"
       "- Minimal caveats/limitationsn"
       "Do NOT fabricate citations. If evidence is thin, state uncertainty.n"
       "Return plain text only."
   ),
   model=make_model(0.3),
)


critic = ChatAgent(
   system_message=(
       "You are a strict reviewer. Evaluate the draft against the goal, correctness, and completeness.n"
       "Return ONLY valid JSON:n"
       "{"score_0_to_10": 0.0, "strengths": ["..."], "issues": ["..."], "fix_plan": ["..."]}n"
       "Do not include markdown code fences."
   ),
   model=make_model(0.0),
)


rewriter = ChatAgent(
   system_message=(
       "You are a revising editor. Improve the draft based on critique. Preserve factual accuracy.n"
       "Return the improved draft as plain text only."
   ),
   model=make_model(0.25),
)

We create special agents: editor, researcher, writer, analyst, and rewriter. We define their system roles carefully to enforce job boundaries and orderly behavior. This creates a modular multi-agent design that allows for collaboration and iterative development.

def plan_goal(goal: str) -> Plan:
   resp = planner.step("GOAL:n" + goal + "nnReturn JSON plan now.")
   raw = resp.msgs[0].content if hasattr(resp, "msgs") else resp.msg.content
   js = extract_first_json_object(raw)
   try:
       return Plan.model_validate_json(js)
   except Exception:
       return Plan.model_validate(json.loads(js))


def research_task(task: PlanTask, goal: str, k: int) -> EvidenceItem:
   prompt = (
       "GOAL:n" + goal + "nnTASK:n" + task.model_dump_json(indent=2) + "nn"
       f"Perform research. Use at most {k} web searches. First search official documentation or GitHub if relevant."
   )
   resp = researcher.step(prompt)
   raw = resp.msgs[0].content if hasattr(resp, "msgs") else resp.msg.content
   js = extract_first_json_object(raw)
   try:
       return EvidenceItem.model_validate_json(js)
   except Exception:
       return EvidenceItem.model_validate(json.loads(js))


def draft_with_self_consistency(goal: str, plan: Plan, evidence: List[Tuple[PlanTask, EvidenceItem]], n: int) -> str:
   packed_evidence = []
   for t, ev in evidence:
       packed_evidence.append({
           "task_id": t.id,
           "task_title": t.title,
           "objective": t.objective,
           "notes": ev.notes,
           "key_points": ev.key_points
       })
   payload = {
       "goal": goal,
       "assumptions": plan.assumptions,
       "tasks": [t.model_dump() for t in plan.tasks],
       "evidence": packed_evidence,
       "success_criteria": plan.success_criteria,
   }
   drafts = []
   for _ in range(max(1, n)):
       resp = writer.step("INPUT:n" + json.dumps(payload, ensure_ascii=False, indent=2))
       txt = resp.msgs[0].content if hasattr(resp, "msgs") else resp.msg.content
       drafts.append(txt.strip())
   if len(drafts) == 1:
       return drafts[0]
   chooser = ChatAgent(
       system_message=(
           "You are a selector agent. Choose the best draft among candidates for correctness, clarity, and actionability.n"
           "Return ONLY the winning draft text, unchanged."
       ),
       model=make_model(0.0),
   )
   resp = chooser.step("GOAL:n" + goal + "nnCANDIDATES:n" + "nn---nn".join([f"[DRAFT {i+1}]n{d}" for i, d in enumerate(drafts)]))
   return (resp.msgs[0].content if hasattr(resp, "msgs") else resp.msg.content).strip()

We use orchestration logic for planning, research, and consistent writing. We compile structured evidence and generate multiple candidate drafts to improve robustness. We then select the best drafts with an additional screening agent, mimicking collective-style thinking.

def critique_text(goal: str, draft: str) -> Critique:
   resp = critic.step("GOAL:n" + goal + "nnDRAFT:n" + draft + "nnReturn critique JSON now.")
   raw = resp.msgs[0].content if hasattr(resp, "msgs") else resp.msg.content
   js = extract_first_json_object(raw)
   try:
       return Critique.model_validate_json(js)
   except Exception:
       return Critique.model_validate(json.loads(js))


def revise(goal: str, draft: str, critique: Critique) -> str:
   resp = rewriter.step(
       "GOAL:n" + goal +
       "nnCRITIQUE:n" + critique.model_dump_json(indent=2) +
       "nnDRAFT:n" + draft +
       "nnRewrite now."
   )
   return (resp.msgs[0].content if hasattr(resp, "msgs") else resp.msg.content).strip()


def pretty_plan(plan: Plan):
   tab = Table(title="Agent Plan", show_lines=True)
   tab.add_column("ID", style="bold")
   tab.add_column("Title")
   tab.add_column("Objective")
   tab.add_column("Deliverable")
   for t in plan.tasks:
       tab.add_row(t.id, t.title, t.objective, t.deliverable)
   console.print(tab)


def run(cfg: RunConfig):
   console.print(Panel.fit("CAMEL Advanced Agentic Tutorial Runner", style="bold"))
   plan = plan_goal(cfg.goal)
   pretty_plan(plan)


   evidence = []
   for task in plan.tasks[: cfg.max_tasks]:
       ev = research_task(task, cfg.goal, cfg.max_searches_per_task)
       evidence.append((task, ev))


   console.print(Panel.fit("Drafting (self-consistency)", style="bold"))
   draft = draft_with_self_consistency(cfg.goal, plan, evidence, cfg.self_consistency_samples)


   for r in range(cfg.max_revision_rounds + 1):
       crit = critique_text(cfg.goal, draft)
       console.print(Panel.fit(f"Critique round {r+1} — score {crit.score_0_to_10:.1f}/10", style="bold"))
       if crit.strengths:
           console.print(Panel("Strengths:n- " + "n- ".join(crit.strengths), title="Strengths"))
       if crit.issues:
           console.print(Panel("Issues:n- " + "n- ".join(crit.issues), title="Issues"))
       if crit.fix_plan:
           console.print(Panel("Fix plan:n- " + "n- ".join(crit.fix_plan), title="Fix plan"))
       if crit.score_0_to_10 >= 8.5 or r >= cfg.max_revision_rounds:
           break
       draft = revise(cfg.goal, draft, crit)


   console.print(Panel.fit("FINAL DELIVERABLE", style="bold green"))
   console.print(draft)


run(cfg)

We use a critique and review loop to enforce quality control. We score drafts, identify weaknesses, and revise as needed. Finally, we develop a full pipeline, generating systematic, research-based delivery through collaboration between agents.

In conclusion, we have developed a CAMEL-based multi-agent production system that goes far beyond fast binding. We built agent communication with validated schemas, integrated web search tools for ground-based reasoning, used stability to improve output reliability, and enforced quality using an internal critique loop. By combining these advanced concepts, we have shown how to build scalable, generic, and reliable agent pipelines suitable for real-world AI applications.


Check it out Full Codes with Notebook here. Also, feel free to follow us Twitter and don’t forget to join our 130k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.

Need to work with us on developing your GitHub Repo OR Hug Face Page OR Product Release OR Webinar etc.? contact us


Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button