Question 1

What does Setta actually do?

Accepted Answer

Setta reads your trace data, maps how your product calls models, and replays proposed optimizations against your real traffic to score them before applying. The optimizations that survive the replay get applied. The rest are dropped. It runs continuously, so when the frontier shifts the loop runs again.

Question 2

How does the pricing work?

Accepted Answer

Performance-based. We take a share of verified savings. Until savings are verified, the fee is zero. There is no platform fee, no seat fee, and no annual minimum.

Question 3

How is this different from a model router or LLM gateway?

Accepted Answer

Routers and gateways commit to a single technique (per-request routing, per-request observability) and apply it universally. Setta is workload-aware: it picks the technique that suits the workload, proves the savings on your traces first, and stacks techniques rather than choosing one. It also optimizes total cost (input plus output), not just input tokens.

Question 4

Will Setta hurt my model's quality?

Accepted Answer

No. We don't route to cheaper models and we don't delete context. Prompt-cache optimization captures discounts on the same model and the same prompt. Tool filtering sends only the tools your traces show the model actually reaches for. Sub-agent delegation isolates work so the parent never drags around the sub-agent's scratch. Workflow compilation only skips the model on sequences that have already proven deterministic.

Question 5

What is prompt-cache optimization?

Accepted Answer

Anthropic, OpenAI, Google, and DeepSeek now offer prompt-cache discounts of up to 90% on cache hits. Capturing them isn't automatic. The prompt has to be shaped right, the tools have to live in the right place, and plenty of workloads have cacheable patterns nobody noticed. Setta audits your traces for the misses and tells you what to fix: broken prefixes, tool definitions inlined where they shouldn't be, calls where caching was never considered in the first place.

Question 6

Do I have to change my code?

Accepted Answer

No. Setta plugs into trace data you already produce (LangSmith, Langfuse, or direct provider logs). Most teams are integrated in under five minutes.

Question 7

Is Setta available now?

Accepted Answer

We're in early access and onboarding design partners. The form below is the easiest way to start.

Technique	Mechanism	Monthly savings
Prompt-cache audit	Finds prefixes that should be cacheable but aren’t, tool defs placed where the cache breaks, and call sites where nobody noticed caching was an option.	$38,400
Semantic tool filtering	Sends only the tools a given request would plausibly reach for. You stop paying for tool definitions the model never uses on that branch.	$21,500
Workflow compilation	Detects tool sequences agents redo run after run and compiles them into deterministic shortcuts. The model is skipped only on steps that have already proven stable.	$15,800
Sub-agent delegation	Isolates work in clean contexts and returns only the final answer. The parent agent never drags around the sub-agent’s scratch.	$7,900
Trace-based meta-tools	When the same primitives recur in a stable pattern, they’re really one higher-level operation. Setta collapses the pattern into a single call.	$4,400
Total saved · 88% of the bill		$88,000

Optimize AI costs.
Never sacrifice intelligence.

The cost ceiling teams keep hitting.

How Setta thinks about it.

Ingest your traces.

Simulate at frontier scale.

Deploy what wins.

What’s currently on the bench.

What sets Setta apart.

Questions we keep getting.

Cut your bill, starting this quarter.

Optimize AI costs.Never sacrifice intelligence.