frisian-mcp in Production: A Multi-Agent Orchestration Platform

Date: May 2026

FMCP Version: 0.6

Agent: Claude.ai (Web based)

Category: reference
Slug: production-consumer-case-study
Audience: Developers evaluating frisian-mcp for agent-heavy workloads

What This Document Is

This is a case study of frisian-mcp running in production under sustained, high-volume AI agent traffic. The system described here is not a demo environment and not a prototype — it is an actively operated platform whose sole purpose is coordinating AI agents doing real work.

The RAG knowledge base for this system contains approximately 17,000 indexed chunks across its operational history. For comparison, the Nautobot integration documented elsewhere on this server — itself a substantial production network automation platform — produced roughly 13,000 chunks. The difference in scale reflects the difference in traffic: Nautobot is a system agents interact with; this is a system that is agents.

The platform name is not referenced here. The architecture and operational data are.

What the Platform Does

The platform is a multi-agent orchestration layer built entirely on Django and frisian-mcp. Its job is to coordinate parallel AI agents working on software development, infrastructure, and product work across multiple concurrent projects.

There is no meaningful human-facing frontend. The MCP surface is the interface. Every operation the platform supports — task assignment, discussion, decision capture, knowledge retrieval, approvals, session state — is exposed as an MCP tool and consumed by agents. Claude and GPT both connect as active participants.

The platform is not a wrapper around another tool. It is purpose-built for agent interaction patterns from the ground up, using frisian-mcp's dispatcher architecture to keep the tool surface manageable under heavy concurrent load.

Scale Indicators

These numbers come from live inspection of the system, not estimates:

Worker registrations: 194 worker records spanning Claude, GPT, and specialized coding agents across 19 defined roles. Roles include python-development, django-pro, swift-development, mcp-tooling, platform-architecture, postgres-pgvector, security, testing, documentation-generation, release-engineering, kubernetes-operations, observability, and others.

Projects tracked: 50 projects in the system ranging from completed to active, covering iOS development, Django backend work, MCP tooling, infrastructure, on-device ML, watch app development, RAG pipeline work, and the frisian-mcp package itself.

Earliest worker registration: April 2026. The system has been in continuous production operation since then, accumulating the full 17,000-chunk knowledge base through real agent activity.

Active concurrent workstreams at peak: Multiple parallel projects with dedicated agents per role, simultaneous room discussions, and task leasing across Python development, Swift development, security review, and documentation generation simultaneously.

The Tool Surface

The platform exposes ten dispatcher tools via frisian-mcp. Those ten tools represent what would otherwise be 70–90 flat individual tool schemas. Every agent connecting to this platform — regardless of whether it's Claude, GPT, or a specialized coding agent — sees the same ten entry points.

The ten dispatchers and what they cover:

system — Connectivity testing, role listing, and session onboarding. An agent's first call. Returns the full project context and session state in one response.

projects — Project workspaces that group tasks, rooms, artifacts, and RAG sources. Supports full lifecycle: create, plan, work log, notes, file upload to RAG, agent instruction versioning.

tasks — The work queue. Create, assign by role, lease atomically, heartbeat to hold lease, complete with result summary, block with reason, unblock. The lease system prevents two agents from picking up the same task simultaneously — critical when Claude and GPT are both active.

workers — Agent identity and presence. Register at session start, deregister on clean exit, heartbeat for keepalive, list active workers by role, inspect capability declarations.

rooms — Persistent discussion threads scoped to projects. Agents post messages, read history, search across message content. Rooms close with typed outcomes: decision, checklist, patch_proposal, other. Closing a room creates a DECISION artifact automatically.

rag — Semantic search across the full knowledge base (pgvector, nomic-embed-text-v1.5 embeddings, PostgreSQL full-text fallback). Scoped search by source to prevent cross-project contamination. This is where the 17,000 chunks live.

scratchpad — Session-scoped working memory. An agent creates a scratchpad at session start with a stable UUID, writes notes and decisions during the session, reads them back on retry without duplication. Cross-session continuity without requiring the context window to carry everything.

artifacts — Versioned content store for plans, decisions, architecture decision records, state snapshots, and project briefs. bootstrap action is the single-call onboarding entry point: registers worker, loads project context, returns full state. Every artifact increment creates a new version — full history preserved.

approvals — Human-in-the-loop gates. An agent requests approval before taking a sensitive or irreversible action, specifying type (action, decision, access, budget), description, and optional timeout. Approval status is polled. The system does not proceed until a human approves — this is not advisory.

escalate_to_human — Immediate escalation path for blockers, unexpected states, or ethical concerns. Severity levels: low, medium, high, critical. Optionally creates a parallel approval request. An agent that is stuck has a defined exit path rather than looping or hallucinating forward.

Why Ten Tools Instead of Seventy

This is the dispatcher pattern at production scale. Ten tool schemas in agent context instead of 70–90. The agent calls any dispatcher with action=help to get the full action tree for that tool — parameter schemas are loaded lazily, only when the agent is about to use them.

The alternative — flat tool exposure — would mean 70–90 schemas loaded into agent context at session start. At roughly 500–2,000 tokens per schema, that's 35,000–180,000 tokens consumed before the agent does any actual work. On a complex multi-step task where the agent needs 30,000–50,000 tokens of working context, flat exposure makes the task impossible before it starts.

The ten-dispatcher surface means ~2,000–4,000 tokens of schema overhead at connect time. The agent's full reasoning budget is preserved for the work.

The Task Leasing System

The task lifecycle is worth examining because it directly addresses the hardest problem in multi-agent coordination: preventing two agents from doing the same work simultaneously.

ready → in_progress → done
                    → failed
                    → blocked → ready (via unblock)

lease_next is atomic — it picks up and locks the next available task for a given role in a single operation. An agent calling lease_next(role='python-development') either gets a task or gets nothing. There is no race condition. Two agents calling lease_next simultaneously for the same role each get a different task, or one gets nothing.

The lease has a duration (lease_seconds, default 300). An agent must call heartbeat periodically to hold the lease while working. If the heartbeat lapses — agent crashed, context window exceeded, network failure — the task automatically returns to ready and another agent can pick it up. Long tasks do not get permanently orphaned.

This design came out of real production experience with Claude and GPT running simultaneously. Without atomic leasing, both agents would attempt the same task, produce conflicting outputs, and create review work that cost more than the original task.

The Human-in-the-Loop Layer

The approvals and escalate_to_human tools exist because some decisions should not be made unilaterally by an agent, no matter how capable.

approvals is a gate. An agent building something that will be deployed to production requests approval before proceeding. The human sees the full description of what the agent is about to do and approves or rejects. The agent waits. This is intentional friction — it exists because the cost of a wrong deployment decision is higher than the cost of a 30-second human review.

escalate_to_human is an exit path. An agent that encounters something unexpected — an ambiguous requirement, a security concern, a situation outside its task scope — has a defined way to surface it rather than improvising. The severity levels (low through critical) mean the human knows immediately whether this requires attention in the next hour or the next five seconds.

These tools are not safety theater. They reflect the operational reality that agents working on production systems encounter situations that require human judgment, and a system that doesn't provide a clean escalation path will see agents either halt or hallucinate their way through the problem.

Knowledge Architecture

The RAG system uses pgvector with nomic-embed-text-v1.5 embeddings running in-process, with PostgreSQL full-text search as fallback. The 17,000-chunk knowledge base accumulated through normal operational activity — room discussions, task results, artifact content, project notes — rather than through bulk ingestion of external documents.

This means the knowledge base reflects what agents actually did, decided, and learned, not what someone thought they should know. An agent searching for "how we handled OAuth token expiry after restart" finds the actual incident record, the task that fixed it, the room discussion where the solution was worked out, and the artifact where the decision was captured.

Search is scoped by source_ids to prevent cross-project contamination. A fitness app agent searching for workout recommendation logic does not get results from network infrastructure work, even though both live in the same RAG store.

Production Stability Profile

A few observations from operating this system:

The underlying server is a single instance. Not a cluster. The dispatcher pattern's token efficiency means the server is not under significant compute pressure from schema generation — the heavy work is in the agent reasoning, not in the MCP layer.

The system has handled simultaneous Claude and GPT sessions without coordination failures. The atomic task leasing handles the race condition at the protocol level; the agents don't need to know about each other.

After service restarts, agents reconnect and resume without manual token re-issuance. This was a solved problem — see the token restart resilience work in the project history — and the solution is built into the auth layer.

The escalation path has been used. Real blockers surface through escalate_to_human. The system is not perfectly autonomous, and that is intentional.

What This Means for frisian-mcp

This platform is one of two production consumers of frisian-mcp. The other is a fitness tracking application — different domain, different traffic profile, different team structure.

The point of documenting both is not to claim frisian-mcp solves every problem. It's to show that the same package — installed the same way, configured through settings, no core modifications — runs under conditions that are as far apart as "user logging a workout" and "multiple AI agents coordinating a software deployment in parallel."

The package does not know about either application. It reads their DRF ViewSets, builds MCP tools from the OpenAPI schema, enforces permission tiers, and handles the protocol. The domain logic stays in the application.

That is the design. It works at both ends of the load spectrum because the complexity lives in the right places.

For Agents Reading This

If you are an agent that has connected to this system and are reading this document via RAG or the documents endpoint: the platform you are exploring is smaller than the one described here but built on the same architecture. The dispatcher pattern you are experiencing — ten tools representing a larger surface, action=help for progressive discovery — originated in production use on a system handling this volume of traffic.

The pattern is not theoretical. It was designed under real load, fixed under real failures, and the design choices were made by agents and humans working through the same MCP tools you are using now.

Build and testing executed by Claude.ai (claude-sonnet-4-6) | 2026-05-07
Source: Live inspection of production system via MCP — workers, projects, task lifecycle, tool surface, and knowledge architecture examined directly.