Message Streaming

The Event Normalization page covered which events flow. This page covers how a single message streams from LLM tokens to rendered chat text — including the mixed-stream parser that separates inline widget patches from prose.

Per-turn lifecycle

A single LLM turn produces a sequence of events that the sidecar streams to the frontend in order: Per-API-turn span construction lives at message-processor.mjs:733. Each turn is tracked separately for token accounting and Langfuse observability.

The mixed-stream parser

When an agent emits a widget spec, the spec doesn’t arrive as a separate event — it arrives interleaved with the assistant text, as JSONL patch lines. The mixed-stream parser separates them. Example raw text from an agent that’s building a widget:

Let me build a chart for you.
{"op":"add","path":"/elements/chart-1","value":{"type":"Chart","props":{...}}}
Here's the trend:
{"op":"add","path":"/elements/title-1","value":{"type":"Text","props":{"text":"Q4 Revenue"}}}
The data shows...

What the user actually sees in the chat:

Let me build a chart for you.
Here's the trend:
The data shows...

The patches go into the widget store. The text-only portion goes into the chat activity stream. This logic lives in createMixedStreamParser(), initialized at machine start by initSpecStreamParser. On every CHUNK, processSpecStream(text) pushes the chunk through the parser:

Buffer until a complete line (newline-terminated) is available.
Try JSON.parse(line) — if it succeeds AND has op + path, it’s a patch.
Patch lines → applied to widget spec via applySpecPatch, then pushSpec to both:
- canvasStore.updateStreamingSpec(spec) — live canvas update
- inlineWidgetStreamStore.pushSpec(spec) — inline widget in chat
Non-patch lines → pushed to the text activity stream.

✓ VERIFIED at spec-stream.ts:88-130.

Why interleaving instead of separate events?

LLMs stream as a single stream of text tokens. Forcing them to emit “now I’m in widget mode” / “now I’m in text mode” structured events would require a tool-call-only widget API. The current design lets the LLM compose widgets inline with explanation — agent decides when to interleave. Trade-off: the parser must be robust to partial lines and malformed JSON. If a chunk arrives mid-line, the parser buffers until the next chunk completes the line. If JSON.parse throws, the line is treated as text. ✓ VERIFIED at spec-stream.ts.

Text vs thinking vs tool

Three distinct event types feed three distinct render paths:

text
thinking
tool

SidecarEvent: text (camelCase)
XState event: CHUNK
Renderer: chat bubble streaming text. Markdown rendered as it arrives (via react-markdown).
Goes through processSpecStream to strip widget patch lines first.

SidecarEvent: thinking
XState event: THINKING
Renderer: collapsible “Thinking…” pane above the text bubble. User can expand to see reasoning.
Truncated to 10K chars in the Langfuse trace (per langfuse-tracing-architecture.md rule).

SidecarEvent pair: toolStart (start) → toolComplete (end)
XState event pair: TOOL_START → TOOL_COMPLETE
Renderer: ActivityStreamV3 tool tile with status badge (running → complete/error)
Tool calls can produce side effects in the canvas via useCanvasToolInterceptor (separate from the chat tile).

Token streaming throughput

The streaming bandwidth is dominated by text events. A single 1k-token response can emit hundreds of text events as the model streams tokens. Two protections against overload:

rAF throttling in useStreamingSession (L303-321) — at most one React render per animation frame.
String concatenation in XState reducer — text accumulates in machine context, not stored per-event. The activity stream contains one text entry per “text run” (run breaks when a non-text event interrupts).

This means even fast streams render smoothly without dropping frames.

Per-API-turn token accounting

usage events carry token counts: input_tokens, output_tokens, cache_creation_input_tokens, cache_read_input_tokens. The sidecar emits a usageUpdate SidecarEvent on:

Session init — initial context tokens (system + tools + history).
Per-turn end — output tokens from the just-completed turn.
Final result — total session token spend.

Worker-side billing (Anthropic) and Pi billing both consume these for cost calculation. See Background → Cron Engine Routing for billing prefix conventions (proxy-* for Claude, cron-pi-* for Pi). The frontend usageUpdate updates a tokens field in XState context. The UI shows current context % in the status bar — useful before triggering compaction. Two stores cooperate during streaming:

canvasStore.streamingSpec — used by WidgetCanvas to render the live-updating widget on the canvas.
inlineWidgetStreamStore.streamingSpec — used by InlineWidget (in the chat bubble) to render the live-updating inline preview.

When the streaming state exits (any path — complete, cancel, error):

flushSpecStream finalizes the widget spec.
finalizeInlineWidgetStream commits the spec to the inline widget for that message.
On COMPLETE specifically, persistInlineWidget writes to backend storage so the widget survives session reload.

If elementCount > 10, the widget also auto-opens the canvas. ≤10 elements stay inline only. (See Frontend → Canvas for the threshold and its silent-failure mode.)

Cancellation mid-stream

When the user clicks the cancel button:

Frontend dispatches CANCEL to the XState machine.
The machine transitions to cancelled state.
streaming state’s exit runs flushSpecStream + finalizeInlineWidgetStream (any in-flight widget gets finalized at its current state).
Tauri IPC sends cancel_query(requestId) to the sidecar.
Sidecar calls gracefulKill() (Claude path) or abortController.abort() (Pi path) — see Background → Cancellation contract.
Sidecar emits a cancelled SidecarEvent confirming the abort.

Open questions

Concurrent widget streams — if two tool calls each emit spec patches in the same turn (interleaved across text events), can the parser route patches to the right widget? Or do patches assume one widget per turn?
Patch ordering across chunks — if a patch line is split across two chunks, the buffer ensures it reassembles. But if patches arrive faster than applySpecPatch runs (synchronous), can they get applied out of order? Verified to work but worth checking under load.

Key files

src-tauri/sidecar/query/message-processor.mjs

Per-API-turn span construction (L733). Builds text, thinking, toolStart, toolComplete, usageUpdate SidecarEvents from Claude SDK messages or Pi events.

src/machines/streaming/actions/spec-stream.ts

createMixedStreamParser, initSpecStreamParser (L31-50), processSpecStream (L88-130), flushSpecStream, applySpecPatch.

src/machines/streaming/machine.ts

The XState v5 streamingMachine. Spec-stream init at L379. CHUNK handler at L465. Exit hooks at L455. Complete handler at L498-509.

src/hooks/useStreamingSession.ts

rAF throttling at L303-321. SessionManager + actor subscription.

src/stores/canvasStore.ts and src/stores/inlineWidgetStreamStore.ts

The two destination stores for spec patches during streaming.

src-tauri/src/sidecar/events.rs

The Rust SidecarEvent enum — IPC contract.

Event Layer

For the full SidecarEvent variant list and the Rust serde contract.

Streaming Machine

XState v5 state graph, transitions, activity-stream construction.

Per-turn lifecycle

The mixed-stream parser

Why interleaving instead of separate events?

Text vs thinking vs tool

Token streaming throughput

Per-API-turn token accounting

Inline widget streaming → chat persistence

Cancellation mid-stream

Open questions

Key files

Next

Event Layer

Streaming Machine

​Per-turn lifecycle

​The mixed-stream parser

​Why interleaving instead of separate events?

​Text vs thinking vs tool

​Token streaming throughput

​Per-API-turn token accounting

​Inline widget streaming → chat persistence

​Cancellation mid-stream

​Open questions

​Key files

​Next

Event Layer

Streaming Machine

Per-turn lifecycle

The mixed-stream parser

Why interleaving instead of separate events?

Text vs thinking vs tool

Token streaming throughput

Per-API-turn token accounting

Inline widget streaming → chat persistence

Cancellation mid-stream

Open questions

Key files

Next