Use case

Use the same models you ship to triage your own customers

AI products fail in fuzzy ways. Prompt drift, schema mismatches, hallucinated tool calls — these surface in support tickets first. Watari runs Claude on your tickets, your prompts, and your code so the right file lands in the right PR.

The bug that bit you is not the file the model named

AI products carry non-deterministic failure modes. Prompt drift, schema mismatches, hallucinated tool calls, model-version regressions — every one of these surfaces in the support inbox before it surfaces in your dashboards. The file the model named in its output is rarely the file that needs to change. Generic triage tools cannot tell the difference.

Vision

Read the LLM output, not a paraphrase.

Your customers screenshot the broken chat, not the stack trace. Vision-aware extraction reads every attachment in the ticket — chat transcripts, structured-output JSON, broken tool calls — preserves the verbatim text untouched, parses structured fields, and emits a bug grounded in what the customer actually saw.

  • LLM-output aware

    Chat transcripts, JSON blobs, tool-call traces parsed as evidence.

  • Verbatim preserved

    Customer quotes survive extraction; nothing is paraphrased into the bug report.

  • Structured fields

    Severity, repro steps, and expected vs actual pulled into a typed schema.

Customer verbatim
LLM output · screenshot attached

The chat agent invented a tool call that does not exist in my org. I asked "what is my balance" and instead of using the read_balance tool it hallucinated a "transfer_funds" tool and returned a confirmation that looked legit. I have a screenshot of the message and the agent transcript attached. LLM: gpt-4-class Tool: transfer_funds (hallucinated — not in our schema)

The chat agent invented a tool call that does not exist in my org. Screenshot attached.

Prompts as code

Prompts mapped like code.

A prompt edit that ships a regression is just as load-bearing as a function edit, and your map needs to know that. Watari indexes prompts, system messages, schemas, and eval fixtures into the same pgvector pipeline as TypeScript or Python — so a prompt-driven regression maps to a prompt file, not a guess at the call site.

  • Prompts indexed

    .md, .txt, and prompt template files indexed alongside code.

  • Schemas indexed

    JSON schemas, Zod, and Pydantic models in the same vector space.

  • Eval fixtures indexed

    Fixtures and golden files become candidate map targets, not noise.

Code Locations
#1High · 88%
prompts/router.prompt.mdsystem prompt — tool schema

router prompt fails to constrain tool names to the declared whitelist

#2Med · 79%
src/ai/tool-schema.tsTOOL_DEFINITIONS

transfer_funds is described in docs but never registered in the live schema

Verify loop

Real failure log, hash-twice bail.

Non-deterministic model output makes CI fail in ways a synthetic repro can’t reach. The verify loop iterates the draft PR against the real failure log attached to the ticket. A hash-twice bail catches flakes — when the same failure hash repeats, the loop stops and surfaces the flake to engineers instead of burning compute.

  • Failure-log driven

    The loop runs against the real failure attached to the ticket.

  • Hash-twice bail

    Repeating failure hash stops the loop and flags the flake.

  • Extended thinking

    Extended thinking for the verify-iterate step; a fast extraction model handles cheap extraction.

CI verify loop
passing
  • lint4.2s
  • typecheck18.7s
  • test46.1s
  • build2m12s

Hash-twice bail · flake surfaced

Same failure hash twice in a row. Stopping the loop and flagging the flake to engineers instead of burning more compute.

We use Claude to triage tickets, map bugs, and write the RCA — the same family of models our customers ship. Eat your own cooking.

Vision-aware

Bug extraction model

Watari model spec

Extended thinking

Mapping + verify loop

Watari model spec

0.7

Dual confidence gate

Watari billing model

Frequently asked questions

Your next support ticket arrives as a draft PR.

Connect Zendesk or Intercom, install the GitHub App. Tickets land mapped to the file, function, and line — ready for your reviewer to take over.

Trial length
14 days
Bugs included
10 Mapped
Card required
No
Mismapped credit
7 days
Cancel
Any time

You only pay when we know what to change.