Proving Agent Protocols Correct Before They Run
Multi-agent bugs hide in coordination, not code. agenticraft-foundation uses process algebra and session types to catch deadlocks and protocol violations at design time.
Your agents pass every unit test. Then three of them deadlock in production and nobody can figure out why.
I've been building multi-agent systems long enough to know where the real bugs live. They're not in the prompt engineering. They're not in the tool calls. They're in the coordination — the handoffs, the synchronization points, the moments where Agent A waits for Agent B while Agent B waits for Agent A, and your entire pipeline goes silent.
Today we're releasing agenticraft-foundation, a standalone Python package for formally verifying multi-agent protocols. It's the verification layer we built for AgentiCraft, extracted into a package anyone can use.
With uv (recommended):
uv add agenticraft-foundationOr with pip:
pip install agenticraft-foundationOr open the repo in a Dev Container — Python 3.12, uv, Ruff, and mypy pre-configured with format-on-save:
git clone https://github.com/agenticraft/agenticraft-foundation.git
cd agenticraft-foundation
# Open in VS Code → "Reopen in Container", or:
devcontainer up --workspace-folder .Python 3.10+. Apache 2.0 licensed. GitHub · Docs
The Bug That Tests Can't Catch
Here's the scenario. You have a handful of agents in a deployment pipeline: a planner, a builder, a tester, a deployer. The planner assigns tasks to the builder, the builder configures the environment and hands off to the tester, the tester validates and signals the deployer. Every agent works correctly in isolation. Your unit tests pass. Your integration tests — the ones that run the happy path — pass too.
Then you deploy it. Under load, two agents in the pipeline disagree about what happens next. The builder tries to hand off to the deployer while the tester expects to validate first. Everything hangs. No errors. No exceptions. Just silence.
This isn't a code bug. It's a protocol bug. The interaction pattern between your agents has a deadlock, and no amount of unit testing will find it — because the bug doesn't exist in any single agent. It exists in the space between them.
The standard response is to write more integration tests, more end-to-end tests, more chaos tests. But multi-agent protocols have combinatorial state spaces. Four agents with ten states each produce 10,000 possible configurations. Add concurrency, nondeterminism, and recursive behaviors, and you're looking at millions of reachable states. Your test suite covers a fraction. The deadlock hides in the fraction you didn't test.
The fix isn't more tests. It's a fundamentally different verification approach.
What Formal Verification Gives You
Distributed systems research solved this problem decades ago. Process algebra lets you model agent communication as a mathematical object. You describe what each agent does, how they synchronize, and what events they exchange. Then you exhaustively explore every possible execution path through the combined system.
Not a thousand test cases. Every case. Proven.
The key insight is that you're working with a model of your protocol, not the agents themselves. You describe "Agent A sends a task, then waits for a result" as a formal process using process algebra. You compose multiple agent processes in parallel with synchronization constraints. Then you ask: can this system reach a state where no progress is possible?
If the answer is yes, you get the exact trace of events that leads to the deadlock. If the answer is no, you have a proof — not a test result, a proof — that your protocol is deadlock-free.
Honest Caveat
Formal verification proves properties of the protocol model, not the LLM's runtime behavior. An agent can still hallucinate. But you know the coordination won't deadlock, the message sequence won't violate the protocol, and every agent will eventually get the messages it expects.
agenticraft-foundation: The Toolkit
The package is standalone — no dependency on the AgentiCraft platform. Install it, use it with any agent framework, throw it into your CI pipeline. It has a single runtime dependency (NumPy) and nothing else.
Here's what's inside:
13 process algebra operators. The building blocks for modeling any agent communication pattern. Prefix (event sequencing), external and internal choice, parallel composition with synchronization, sequential composition, hiding, interrupt, timeout, guarded processes, renaming, and piping. These aren't academic toys — Interrupt models priority overrides, Timeout models bounded LLM calls with fallbacks, Guard models budget-gated agents.
Labeled Transition Systems. Build the complete state graph from any process. Every reachable state, every possible transition. Then run analysis: detect deadlocks, enumerate traces, check liveness properties. The lts_to_ascii function renders the state graph as text — you can see the bug in your terminal.
Multiparty Session Types. Define a global protocol ("the client sends a request, the server responds, the monitor logs it"), then project it to each participant's local view. Verify that the global protocol is well-formed. Monitor conformance at runtime — if an agent deviates from the protocol, you catch it immediately.
Spectral topology analysis. Measure network resilience via algebraic connectivity. Higher algebraic connectivity means faster consensus and more resilience to node failures. Compare topologies before deploying.
Full serialization. Every data structure — processes, LTS graphs, topology analyses — round-trips to JSON and back. Save verification results, diff them across protocol versions, integrate with any toolchain.
A Practical Example
Let's make this concrete. Two agents coordinate on a deployment: a planner designs the rollout and an executor carries it out. They share a multi-step protocol — assign the task, configure the environment, run tests — then each has a final step. The planner wants to review. The executor wants to deploy.
Both agents are correct in isolation. But there's a deadlock hiding in the composition.
Step 1: Model the Protocol
from agenticraft_foundation import (
Event, Prefix, Parallel, Skip,
build_lts, detect_deadlock, lts_to_ascii, is_deadlock_free,
)
# Shared protocol events
assign = Event("assign")
configure = Event("configure")
test = Event("test")
# Each agent's final step
review = Event("review")
deploy = Event("deploy")
# Planner: assign → configure → test → review
planner = Prefix(assign, Prefix(configure, Prefix(test,
Prefix(review, Skip()))))
# Executor: assign → configure → test → deploy
executor = Prefix(assign, Prefix(configure, Prefix(test,
Prefix(deploy, Skip()))))Each agent looks correct on its own. They agree on the first three steps. They each have a clear final action.
Step 2: Compose and Analyze
The bug is in the synchronization. When you require both agents to agree on every event — including their private final steps — you get a deadlock:
# Full synchronization — both agents must agree on every event
system = Parallel(
planner, executor,
frozenset({assign, configure, test, review, deploy}),
)
lts = build_lts(system)
analysis = detect_deadlock(lts)
print(f"Deadlock found: {analysis.has_deadlock}")
print(lts_to_ascii(lts))Step 3: See the Bug
Deadlock found: True
LTS (4 states, 3 transitions)
[0] (initial)
--assign--> [1]
[1]
--configure--> [2]
[2]
--test--> [3]
[3] (deadlock)
Three steps succeed. Then the system deadlocks at state 3. The planner wants review, the executor wants deploy. Both events are in the synchronization set, so each agent needs the other to agree — but they never will.
The trace to deadlock — assign → configure → test — shows exactly where the protocol breaks. Not a stack trace, not a timeout, not a log message. The precise event sequence that leads to the hang.
The bug isn't in either agent. It's in the synchronization set. We told the system that both agents must agree on every event, including events that are each agent's private responsibility.
Step 4: Fix the Protocol
The fix: only synchronize on the shared handshake. Let each agent handle its final step independently.
# Only synchronize on the shared protocol — not private actions
system_fixed = Parallel(
planner, executor,
frozenset({assign, configure, test}),
)
print(f"Deadlock-free: {is_deadlock_free(system_fixed)}")
print(lts_to_ascii(build_lts(system_fixed)))Deadlock-free: True
LTS (12 states, 14 transitions)
[0] (initial)
--assign--> [1]
[1]
--configure--> [2]
[2]
--test--> [3]
[3]
--deploy--> [4]
--review--> [5]
[4] (terminal)
--review--> [6]
...
[6] (terminal)
...
After test, both review and deploy are available — they can happen in either order, independently. The system reaches successful termination regardless of scheduling. One change to the synchronization set, and a one-line function call proves the protocol is correct.
The Modeling Gap
There's a gap this post should be honest about. The formal model is written by hand — you translate your agent coordination logic into Prefix(assign, Prefix(configure, ...)) manually. There's no automatic extraction from a LangGraph DAG or a CrewAI workflow definition. You're modeling the protocol you intend, not the protocol your code actually implements.
This is both a limitation and a feature. The limitation is obvious: if your model doesn't match your code, the verification proves properties of the wrong thing. The feature is less obvious: writing the model forces you to think precisely about what your agents do and how they synchronize. Most protocol bugs I've found were caught at modeling time, before I even ran the verifier — the act of translating informal coordination logic into process algebra made the implicit assumptions explicit.
Automatic model extraction is on the roadmap. For now, think of the formal model as a protocol specification that sits next to your agent code — like a type signature for coordination.
Design Decisions
Two decisions are worth explaining because they reflect real trade-offs.
Process algebra choice. We chose a formalism that models synchronous message passing with explicit alphabets — you specify exactly which events agents share, and verification tells you if the sharing works. More expressive alternatives offer channel mobility (agents can pass communication channels to each other), but that makes verification significantly harder. For agent protocols, where the communication structure is usually fixed at design time, explicit synchronization alphabets are the right trade-off.
Explicit exploration over symbolic methods. When you run build_lts, the package explores the state space by actually constructing every reachable state. This is less scalable than symbolic approaches (which use compact representations to handle millions of states), but it gives you concrete traces — actual event sequences you can read, debug, and reproduce. When a developer sees assign → configure → test → DEADLOCK, they can trace that back to their code. An abstract counterexample is correct but useless to most practitioners. We optimized for interpretability over raw scale.
Where This Fits
You can model your protocols formally, prove deadlock freedom, and monitor session conformance at runtime — today. The package stands alone. Use it with LangChain, CrewAI, AutoGen, or your own framework. It runs in CI, in notebooks, in pre-commit hooks.
Inside the AgentiCraft platform, the foundation package powers protocol compatibility checks, topology optimization, and runtime session monitoring. When agents negotiate coordination patterns, the system verifies the resulting protocol before any agent starts executing. But you don't need the platform to use the verification toolkit.
The package ships with 1300+ tests across the full API surface. The documentation covers every module with API reference and examples — including a RAG pipeline verification walkthrough that combines process algebra, deadlock detection, and topology analysis across 4 agents.
The hardest bugs in multi-agent systems aren't in the agents. They're in the spaces between them. agenticraft-foundation gives you the tools to find those bugs before your users do — and to prove they're gone after you fix them.
The GitHub repo has everything you need to get started. If you're also building production multi-agent systems, join the waitlist for early access to the full AgentiCraft platform.