Does Watari work on Anthropic SDK code?

Yes. The router indexes TypeScript, Python, and Anthropic SDK call sites the same way it indexes any other code. Tool definitions, message construction, and streaming handlers are all candidate map targets when a ticket points at a Claude-powered feature.

How does Watari handle prompts and system messages as source?

Prompts, system messages, and schemas are indexed alongside your code in the same vector pipeline. A prompt edit that ships a regression maps the same way a code edit does — the draft PR opens against the prompt file, the diff is focused, and CODEOWNERS routes the prompt-domain reviewer.

Can Watari debug LLM-output-related bugs?

Yes. Vision-aware extraction reads the LLM output the customer screenshotted, parses structured fields, and surfaces hallucinated tool calls or schema mismatches as concrete bug reports. The map then routes to the prompt, schema, or guardrail that needs the change.

Do you store our prompts?

Prompts are indexed inside your own isolated tenant so the mapper can find them. They are never sent to model providers as training data. Tenant data isolation runs through Postgres row-level security, and the embeddings live in the same region as the rest of your data.

Use case

Ship AI fast without letting support bugs churn your customers

Q: What about non-deterministic test failures from model output?

The verify loop runs against the actual failure log from the ticket, not a synthetic repro. A hash-twice bail catches flakes — when the same failure hash repeats, the loop stops and surfaces it as a flake instead of looping forever. Engineers get a real signal, not burned compute.

Prompt drift, schema mismatches, hallucinated tool calls — your AI fails in fuzzy ways that surface in support tickets first. With no dedicated triage engineer, those bugs sit for days and customers quietly leave. Watari maps each ticket to the prompt, schema, or function that broke, drafts the fix, and closes the loop with the customer.

Start free trial

Slow fixes on AI bugs cost you customers, not just hours

AI products carry non-deterministic failure modes — prompt drift, schema mismatches, hallucinated tool calls, model-version regressions. Every one surfaces in the support inbox before your dashboards catch it. With a small team and no dedicated triage engineer, those tickets pile up, the customer waits, and trust erodes faster than the bug itself. And the file the model named in its output is rarely the file that needs to change — so even the hunt is slow.

Vision

Read the LLM output, not a paraphrase.

Your customers screenshot the broken chat, not the stack trace. Vision-aware extraction reads every attachment in the ticket — chat transcripts, structured-output JSON, broken tool calls — preserves the verbatim text untouched, parses structured fields, and emits a bug grounded in what the customer actually saw.

LLM-output aware
Chat transcripts, JSON blobs, tool-call traces parsed as evidence.
Verbatim preserved
Customer quotes survive extraction; nothing is paraphrased into the bug report.
Structured fields
Severity, repro steps, and expected vs actual pulled into a typed schema.

Customer verbatim

LLM output · screenshot attached

The chat agent invented a tool call that does not exist in my org. I asked "what is my balance" and instead of using the read_balance tool it hallucinated a "transfer_funds" tool and returned a confirmation that looked legit. I have a screenshot of the message and the agent transcript attached. LLM: gpt-4-class Tool: transfer_funds (hallucinated — not in our schema)

The chat agent invented a tool call that does not exist in my org. Screenshot attached.

Prompts as code

Prompts mapped like code.

A prompt edit that ships a regression is just as load-bearing as a function edit, and your map needs to know that. Watari indexes prompts, system messages, schemas, and eval fixtures into the same vector pipeline as your TypeScript or Python — so a prompt-driven regression maps to a prompt file, not a guess at the call site.

Prompts indexed
.md, .txt, and prompt template files indexed alongside code.
Schemas indexed
JSON schemas, Zod, and Pydantic models in the same vector space.
Eval fixtures indexed
Fixtures and golden files become candidate map targets, not noise.

Code Locations

#1High · 88%

prompts/router.prompt.mdsystem prompt — tool schema

router prompt fails to constrain tool names to the declared whitelist

#2Med · 79%

src/ai/tool-schema.tsTOOL_DEFINITIONS

transfer_funds is described in docs but never registered in the live schema

Verify loop

Real failure log, hash-twice bail.

Non-deterministic model output makes CI fail in ways a synthetic repro can’t reach. The verify loop iterates the draft PR against the real failure log attached to the ticket. A hash-twice bail catches flakes — when the same failure hash repeats, the loop stops and surfaces the flake to engineers instead of burning compute.

Failure-log driven
The loop runs against the real failure attached to the ticket.
Hash-twice bail
Repeating failure hash stops the loop and flags the flake.
Extended thinking
Extended thinking for the verify-iterate step; a fast extraction model handles cheap extraction.

CI verify loop

passing

lint4.2s
typecheck18.7s
test46.1s
build2m12s

Hash-twice bail · flake surfaced

Same failure hash twice in a row. Stopping the loop and flagging the flake to engineers instead of burning more compute.

We triage tickets, map bugs, and write the RCA with the same family of models our customers ship. Eat your own cooking.

14d

Free trial. 10 Mapped Bugs. No card.

Trial policy

Mismapped credit window

Billing model

0.7

Dual confidence gate

Billing model

Frequently asked questions

Your next support ticket arrives as a draft PR.

Connect Zendesk or Intercom, install the GitHub App. Tickets land mapped to the file, function, and line — ready for your reviewer to take over.

Start free trial

Trial length: 14 days
Bugs included: 10 Mapped
Card required: No
Mismapped credit: 7 days
Cancel: Any time

You only pay when we know what to change.