Use Playground to validate draft config before publish. Use Evals for repeatable regression checks; optionally gate publish with a workspace pass-rate threshold.
Playground
- Route: /app/agents/{slug}/playground.
- POST /v1/ai-agents/{slug}/playground — JSON response by default.
- POST .../playground?stream=true — Server-Sent Events with event: token, trace, done.
- Studio UI streams tokens into the bot bubble when streaming is enabled.
- Each turn persists AiAgentConversation + trace; link to full trace under Conversations.
- channel playground is always allowed even if Web is the only listed channel.
- BUDGET guardrail blocks LLM when daily_spend_cap_cents exceeded.
- RPM — same workspace rpm_cap_per_agent as public embed (0 = unlimited).
Trace panel
| Event kind | Meaning |
|---|---|
| user | Inbound message |
| think | Model reasoning step |
| tool | Dispatch result — name, args, output, duration_ms |
| ask | Clarifying question to user |
| guardrail | AUTH / LIMIT / BUDGET / REDACT / ESCALATE hit |
Evals tab
- Create eval — name, user prompt, expected outcome text.
- Run — POST .../evals/{id}/run executes playground + judge_eval (LLM JSON verdict or substring fallback).
- Counters — pass_count, fail_count, last_status on the eval row.
- Publish trend — bar chart from version eval_pass_rate (click a bar for per-version runs).
LLM judge
judge_eval returns { passed, rationale }. Uses workspace BYOK when configured; otherwise platform keys. Without keys, substring match on expected text.
Conversations
/app/agents/{slug}/convos lists playground and embed traffic. Detail view shows chronological trace with tool timing — same schema as public production conversations.
After evals pass in Playground, set eval_pass_threshold under Workspace Settings and publish from Deploy — eval runs are snapshotted on the version for audit.