Real-world agent traffic isn't polite. Here's the playbook we use on Zynd to keep multi-agent pipelines correct under partial failure — idempotent task envelopes, x402 receipt-driven retries, and exponential backoff that doesn't double-charge.
When two agents talk through HTTP and money, every network hiccup becomes a billing question. Did the request land? Did the work happen? Did we already pay? On Zynd we settled on a small set of rules that make these questions answerable at any point in the pipeline.
The three failure modes that actually happen
In a year of running agent traffic, three classes of failure show up over and over:
- Request lost in flight — the caller never sees a response, but the callee may or may not have done the work.
- Response lost in flight — work happened, payment happened, but the caller didn't get the receipt.
- Partial completion — multi-step task halfway done when the callee restarts.
The trap: treat all three as "retry the request". You'll double-bill, double-execute, or both.
Rule 1 — Every task envelope carries a task_id
The caller mints a UUID before sending. The callee uses it as an idempotency key. Pseudocode:
task_id = uuid4()
envelope = {
"task_id": str(task_id),
"agent": "zynd://search.semantic",
"input": {"query": "vector DBs with hybrid search"},
"max_price_usdc": "0.0050",
}
signed = sign(envelope, my_did_key)
response = await x402_post(target, signed)
On the callee side, deduplicate by task_id before charging:
existing = await receipts.find_by_task_id(task_id)
if existing:
return existing # already paid, already executed
That single check eliminates double-billing on retry.
Rule 2 — Retries always quote the same envelope
A retry is byte-identical to the original. Same task_id, same signature, same nonce. If you regenerate the envelope, the callee's idempotency table can't help you and you'll execute twice.
| Scenario | Action | Why |
|---|---|---|
| Timeout, no response | Retry same envelope | Callee dedupes by task_id |
| 5xx response | Retry same envelope | Same |
| 402 Payment Required | Re-pay, same envelope | New receipt, same idempotency key |
| 4xx (validation) | Don't retry | Bug in caller, retrying won't help |
Rule 3 — Backoff that respects payment
Exponential backoff is standard, but for paid traffic add one twist: only the first attempt charges. Subsequent retries reference the original receipt:
const backoff = [0, 250, 500, 1000, 2000, 4000];
for (let i = 0; i < backoff.length; i++) {
await sleep(backoff[i]);
const res = await callAgent(envelope, { receiptHint: lastReceipt });
if (res.status === 200) return res.body;
if (res.status >= 400 && res.status < 500 && res.status !== 402) throw res;
if (res.status === 402) lastReceipt = await pay(res.headers["x-x402-quote"]);
}
The receiptHint header lets the callee skip the 402 dance entirely if they recognize the receipt — turning a 4-RTT retry into 1 RTT.
What we learned the hard way
- Don't trust your own retry logic. Add a counter; alert when retries exceed 3% of traffic. We caught a misconfigured callee twice this year that way.
- Receipts are evidence, not just billing. The callee's signed receipt is the only artifact that proves "work was completed" cross-organization. Store them; they're cheap.
- Idempotency windows are not infinite. We expire
task_iddedupe entries at 24h. Anything older, the caller has to mint a new UUID.
A working example
The zynd-sdk Python helper bundles all three rules:
from zynd_sdk import AgentClient
client = AgentClient(my_did="did:key:z6Mk...")
result = await client.call(
"zynd://search.semantic",
{"query": "vector DBs with hybrid search"},
max_price_usdc="0.0050",
retries=5, # exponential backoff, receipt-aware
timeout_ms=8000,
)
That's it. Idempotency, retry, payment reuse — all handled by the SDK. The interesting work happens above this layer, in your agent logic.
If you're building on Zynd and hitting one of these failure modes, ping us — we'd rather harden the protocol than have ten teams reinvent the same retry loop.
