Wafer documentation
Wafer is an AI security gateway. It sits between your app and the model and applies guardrails to every request and response — by changing one line of code.
Introduction
Point your existing OpenAI / Anthropic / Gemini / Mistral SDK at a project's gateway URL, keep your own provider key (BYOK — Wafer never stores it), and Wafer enforces your policy at the edge. For Cloudflare Workers AI env.AI bindings, a small wrapper instruments calls in-process (see Workers AI).
Quickstart
- Sign in to the console and create a project — you get a gateway URL and a default policy.
- Point your SDK's
base_urlat it (keep your provider key):
from openai import OpenAI
client = OpenAI(
api_key="YOUR_OPENAI_KEY",
base_url="https://wafersecurity.ai/p/<project>/openai",
)
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello from Wafer"}],
)
The console shows live logs immediately. A fresh project stays in a focused setup view until its first request arrives, then unlocks tuning.
Providers & gateway URLs
Every project URL is uniform: https://wafersecurity.ai/p/<project>/<provider>. OpenAI, Anthropic, Mistral and Gemini use their native endpoints/SDKs; the rest are OpenAI-compatible (use the OpenAI SDK pointed at the URL).
| Provider | Base URL | Wire format |
|---|---|---|
| OpenAI | /p/<project>/openai | OpenAI-compatible |
| Anthropic | /p/<project>/anthropic | anthropic |
| Google Gemini | /p/<project>/gemini | gemini |
| Mistral | /p/<project>/mistral | OpenAI-compatible |
| Groq | /p/<project>/groq | OpenAI-compatible |
| DeepSeek | /p/<project>/deepseek | OpenAI-compatible |
| xAI (Grok) | /p/<project>/xai | OpenAI-compatible |
| Together AI | /p/<project>/together | OpenAI-compatible |
| OpenRouter | /p/<project>/openrouter | OpenAI-compatible |
| Perplexity | /p/<project>/perplexity | OpenAI-compatible |
| Fireworks | /p/<project>/fireworks | OpenAI-compatible |
Gemini uses the native generateContent API via the google-genai SDK (http_options.base_url); Anthropic uses its messages API. Your provider key passes through unchanged.
Guardrails
Each guardrail has an action: block redact flag off, set per project in the console (or via the CLI / admin API).
| Guardrail | Catches | Default |
|---|---|---|
| Secrets | API keys, private keys (input + output) | block |
| PII | email, card, SSN, phone | redact |
| Blocklist | your own terms | block |
| Prompt injection | jailbreak / injection attempts | flag |
| LLM judge | nuanced policy (see below) | off |
Tiering keeps latency low: Tier-0 regex (secrets/PII/blocklist) runs inline and applies redactions; Tier-1 (injection) and Tier-2 (judge) run concurrently. If Tier-0 already blocks, the model checks are skipped.
Streaming & latency. Streamed (SSE) responses pass through token-by-token — Wafer never buffers a completion to inspect it, and output guardrails apply incrementally as tokens flow. On the streaming path only guardrails that can block gate the first token; flag checks run alongside the stream and never delay it. The inline Tier-0 checks add under a millisecond, and Wafer runs at the edge — same network as your Workers.
Block signal. A blocked request returns HTTP 403 with headers x-wafer-blocked: 1 and x-wafer-categories (e.g. pii,injection); the JSON body matches your provider's native error shape, so existing SDK error handling catches it. On a stream — whose headers are already sent — the block arrives in-band as a final data: event carrying the same blocked flag and categories. With the env.AI wrapper a block throws WaferBlockedError exposing the same categories. One handler, keyed on the categories, covers every path.
Profiles. Define named posture overrides under one project and pick one per request with the x-wafer-profile header (or the Workers AI wrapper's profile option) — for example a strict, latency-tight interrupt profile and a looser enrich profile, sharing one project, one set of keys, and one log stream. A profile overrides only the guardrails (and cache) you name; everything else inherits the project default, and an unknown or absent profile falls back to it. Manage them with wafer profile <project> set <name> '<json>'.
Batch. For Mistral and Gemini batch jobs, Wafer inspects the uploaded JSONL file — applying secrets, PII and blocklist guardrails to every request line before the file reaches the provider. A blocked line rejects the upload (with the block signal above); redactions are applied in place. Model-based checks don't apply to async batch.
LLM judge
Use a model to classify prompts/responses against a plain-language policy — for rules regex can't express ("no medical or legal advice", topic adherence). Configure it under Guardrails → LLM judge: enable, flag or block, input/output/both, and the policy text. It runs at the edge via Workers AI. For env.AI binding traffic, the judge runs through your own binding (no extra round-trip).
Semantic cache
Return a stored response for near-identical prompts and skip the model call entirely. Enable it per project with a similarity threshold and TTL. Cache is isolated per project. Backed by Vectorize; responses are keyed by an embedding of the (post-guardrail) prompt.
Scoping. Isolate cached answers further with the x-wafer-cache-scope header — e.g. an episode or document id — so a cached answer is never served across scopes or users within the project. For per-item Q&A, set the scope to that item's id, set the TTL to your content-freshness window, and keep the threshold high (the 0.95 default) for answer correctness. The cache serves non-streaming responses, so it complements a streamed path rather than replacing it.
Rate limits & budgets
Cap requests per minute and a daily token budget per project. Both are enforced by a per-project Durable Object (strongly consistent) and reject excess traffic with 429 + Retry-After. Token spend is read from each response's usage.
Set a spend limit in USD per day and per month. Wafer estimates each request's cost from the model's list price and rejects traffic once a cap is reached — a hard ceiling so a project can't run away with your bill. Like every limit, it fails open.
Turn on retries to ride out transient provider failures (429 and 5xx) with automatic backoff, and set an optional fallback model on the same provider for Wafer to try when retries are still failing — so a provider hiccup doesn't take your app down.
Analytics & telemetry
The console shows decisions, cache hit rate, latency (p50/p95), 24h traffic, top guardrails and live logs — for both proxied and binding traffic. By default Wafer captures full request/response telemetry per log: model, tokens, request & response content, decision, findings, latency, status, IP and country. Click any log row for the full detail. Content is stored post-guardrail — secrets and PII are redacted before they're written.
log: "content" to include content, or log: "off" to disable.Decision webhook. Stream every guardrail decision to an external sink — including the native heystack.dev integration for observability and root-cause analysis. Events carry decision metadata and guardrail categories only, never request or response content, and are sent best-effort off the request path. Configure it in Settings → Limits, or wafer webhook <project> <url>.
Audit export. Export a project's request logs as CSV or JSON for audit and reporting: GET /admin/projects/<id>/logs/export?format=csv, the Export CSV button in the Logs tab, or wafer export <project> csv.
API keys
Programmatic access (CLI, agents, the Workers AI wrapper) uses Wafer API keys. Create them from API keys in the console header (also in a project's Agents tab); they're shown once, scoped to your account, and revocable. Use with wafer login or as WAFER_API_KEY (Bearer).
Console
At console.wafersecurity.ai: projects, guardrail config, limits, analytics, logs, a guardrail playground (test policy with no model call), an Agents tab (prompts, skills, CLI, API keys), and settings. Deep-linkable — URLs reflect the project and tab.
CLI
Agent-native CLI over the admin API. Every command supports --json.
npm i -g @wafersecurity/cli wafer login # paste a key (or: export WAFER_API_KEY=...) wafer projects # list wafer guardrails set my-app pii redact wafer cache my-app on wafer ratelimit my-app 60 wafer test my-app "email me at jane@acme.com" # run guardrails, no model call wafer logs my-app --limit 20 wafer init my-app # print gateway base URLs
Agent skills
Install Wafer skills into your coding agent (Claude Code, Cursor, …) with the skills CLI:
npx skills add wafersecurity/wafer-skills --all
Includes wafer, wafer-integrate, wafer-guardrails, wafer-cli and wafer-workers-ai.
Cloudflare Workers AI (env.AI)
env.AI.run(...) is an in-process binding — the gateway can't intercept it. Instrument it in your Worker with @wafersecurity/workers-ai. One line, no per-call changes:
import { withWafer } from "@wafersecurity/workers-ai";
const handler = {
async fetch(req, env, ctx) {
// unchanged — env.AI is auto-wrapped
return Response.json(await env.AI.run("@cf/meta/llama-3.3-70b-instruct", { messages }));
},
};
export default withWafer(handler); // reads WAFER_PROJECT + WAFER_API_KEY
withWafer wraps every handler — fetch, queue, scheduled, tail — so env.AI is guarded in cron jobs and queue consumers too (no handlers dropped). Tier-0 guardrails run locally (zero added latency); the LLM judge runs via your own binding; calls are logged to Wafer (metadata only). Fail-open. There is no truly zero-code option for bindings — this one-line wrapper is the minimum.
{ stream: true } responses, chunks pass through with zero added latency while the full response is captured and logged after the stream ends — both the gateway and the wrapper record complete streamed request/response telemetry. Block-action guardrails still cut the stream on a hit; mid-stream redaction isn't applied (tokens are already sent).WaferBlockedError; a redacted one rewrites the returned text.Fail-open / fail-closed
Default is fail-open: if a guardrail errors or times out, the request proceeds (so Wafer never takes down your app). Switch a project to fail-closed in Settings to block on guardrail failure instead.
Admin API
Authenticate with Authorization: Bearer <WAFER_API_KEY> (or a Clerk session) against https://wafersecurity.ai/admin.
| Method | Path | Purpose |
|---|---|---|
| GET | /projects | List projects |
| POST | /projects | Create |
| GET/PUT/DELETE | /projects/:id | Read / update policy / delete |
| GET | /projects/:id/logs | Recent logs |
| GET | /projects/:id/analytics | Analytics |
| POST | /projects/:id/test | Run guardrails on text |
| GET/POST/DELETE | /keys | Manage API keys |