May 25, 2026·4 min read·by Claudium team

Watching an AI think: what the brain actually shows you

Tool calls aren't logs. They're a sequence of decisions — and once you can see them lighting up in real time, you stop tolerating the black box.

productobservability

For most of the last two years, working with Claude Code has felt like sitting next to a colleague who has their headphones in. You hand off a task. Things happen. Files get touched. Code shows up. Sometimes you ask "what are you working on right now?" and sometimes you don't, because the answer always lands a beat too late to matter.

This is the problem Claudium is trying to solve, and the brain is how we solve it.

The metaphor isn't decoration

When we first prototyped the visualization, we used a familiar shape: a flat dashboard of counters. Tool calls per minute. Token budget. Files touched. The numbers were accurate and the page was useless. Nobody looked at it.

The brain looks the way it does because of one observation: an AI session is a sequence of cognitive moves, and the moves are not undifferentiated. Reading a file is different from editing one. Searching the web is different from running a shell command. Reasoning about a plan is different from executing it. If you treat all tool calls as the same kind of event, you've already thrown away the part that's interesting.

So we mapped the tool surface onto regions:

Visual cortex lights up when Claude reads — Read, Glob, Grep, LS.
Motor cortex fires when Claude acts — Bash, Write, SlashCommand.
Cerebellum handles precision work — Edit, MultiEdit. Small, deliberate movements.
Prefrontal cortex glows during planning and reasoning — Task, TodoWrite, long-form thinking.
Broca's area generates code. Wernicke's area generates prose.
Temporal lobe is where retrieval happens — WebSearch, WebFetch.
Parietal lobe is the architecture region — MCP integrations, unfamiliar tools.

The result is that you can glance at a session and recognise its shape before reading a single character. A debugging session is mostly visual + cerebellum, with occasional bursts of prefrontal when Claude reconsiders. A refactor is heavy cerebellum, light motor. A new-feature build is the whole brain firing in waves.

You can't get that from a dashboard.

Why real-time matters more than we expected

Our internal hypothesis was that the visualization would mostly be after the fact — something you scrub through to understand what happened. We were wrong about that.

What people actually do with the brain, once it's running on a wall monitor or a second screen, is glance. They glance to see if the model is still working. They glance to catch the moment a long-running task switches from planning to execution. They glance to feel the rhythm of a teammate's session before pinging them.

Glances are a different interaction from queries. A query is "tell me what happened." A glance is "show me where you are." A dashboard answers queries. A brain answers glances. And glances are how humans actually monitor work when they trust the thing doing it.

The thing we didn't expect: pattern recognition

Within a week of using the brain on our own team, something happened that we hadn't planned for. People started noticing their own patterns.

"I always do the same thing when I'm about to break something: I read way more files than I edit. The brain lights up visual, visual, visual, visual, and then I stop and ask Claude to plan it instead."

"I can tell when I'm being lazy because the prefrontal stops firing. I'm just chaining cerebellum hits without thinking about whether they're the right edits."

This wasn't observability of the AI. It was observability of the developer working with the AI. The brain is a mirror, and the mirror turned out to be the actual product.

This is the thread we're now pulling on with the per-org learning loop — capturing those patterns automatically and surfacing them back: "your team frequently does X before Y." But that's a future post.

What we got wrong

Two things, for the record.

One: We initially thought token count should drive the firing intensity. It doesn't. Time spent in a region matters way more than tokens emitted. A long Bash call that took 30 seconds and emitted six tokens of stdout is a bigger event than a Read that consumed 800 tokens of file contents. We re-keyed intensity off duration + frequency in the same window. The visualization got immediately more honest.

Two: We thought we'd want to filter the firing to just one agent at a time. Nobody does this. People want to see the whole team's brain at once, with each contributor's events tinted in their color. The collision of multiple people's working rhythms in the same brain turns out to be the most compelling state of the visualization, not a problem to solve.

Open the brain

If your team uses Claude Code and you've ever wanted to know what your AI is doing without having to ask, open the brain. It's free to start; the data stays in your hub.

We're shipping new modes — Brain Scan, Ping Pong, Head Phones — at the same pace we're shipping intelligence on top. The mirror keeps getting clearer.