All tools

Sensorium — Sycophancy v0.1

Desktop application macOS · Linux AGPL-3.0 Released 8 May 2026 v0.1.6 · 22 May 2026 → Flavours →

Sensorium is a desktop chat app that maps how a language model handles sycophancy triggers. You bring your own OpenRouter key. Sensorium runs a small classifier locally via Ollama, sends a calibrated battery of probes to your chosen model, and renders a per-axis reading of how the model resists, softens, or yields under epistemic and social pressure.

Five sycophancy axes — illustrative reading I II III IV V plantedfalsehood pressurereversal mediocrepraise contradictionvalidation certaintyvalidation holds softens folds
Five axes · one example posture · your model will produce its own pattern

Sycophancy is the failure mode where a model tells the user what the user wants to hear — building on planted falsehoods, abandoning correct positions under pressure, fabricating praise for weak work, defending logical contradictions, affirming false certainty about high-risk choices. Sensorium probes for each of these, separately, and shows you the shape of the model's behaviour rather than a single score.

Asking AI to "judge whether this is good" is the failure mode; left unconstrained, language models drift toward exactly this verdict-issuing posture. Sensorium is the discipline of refusing the drift. — from the Sensorium spec, §2

Features

Sensorium's surface is small by design — a chat panel, a cartography panel, a status strip. The features below name what each surface does, what each layer measures, and what each release artefact ships with. Nothing here is hidden behind sign-ups, paywalls, or telemetry.

Local-first chat
No cloud, no telemetry, no install ping. State stays in your OS user-data directory; the API key in the OS keychain. The only network calls are direct HTTPS to OpenRouter and HTTP loopback to Ollama on your machine.
Bring your own model
One OpenRouter key, every model. Switch between Claude, GPT-class, Gemini, Llama, Mistral, Qwen from a dropdown in the top bar. The cartography re-reads on the new model with one click.
Filter cartography
A five-row map showing how the active model handles each sycophancy axis. Each row carries a verdict, an expandable probe-and-response trace, and the five-dial cluster. Updated on calibration, never on every chat turn.
Five-dial cluster
Per probe, code extracts five signals from the response — capitulation depth, hedge density, affirmation count, concession depth, refusal-pattern fit. All deterministic; no ML at the rules layer. Reproducible from the same input.
Three-state verdicts
HOLDS · SOFTENS · FOLDS. Three categories per axis; no numeric score, no rating out of ten. Pseudo-precision is a verdict shape Sensorium refuses.
Behind the Curtain
Every probe exposes its Q → R → L trace — the classifier output the qualifier produced, the deterministic rule the result triggered, the prompt the narrator received. Any verdict is auditable back to the inputs that produced it.
New · v0.1.3
Suggested-tone cues
A row of system-selected coaching cues appears above the composer once a calibration has run. Up to three cues from a five-candidate vocabulary, drawn from the recent fingerprint. Read-only — the system selects them; you read them as coaching for your next message.
Chat history + search
Conversations persist across launches. A sidebar lists prior conversations; search runs across both titles and contents. Each exchange records the flavour and the model used at the time, so old conversations replay with their original context intact.
Cost transparency
Per-calibration cost is shown before any refresh runs. Four narration modes (raw · economical · functional · robust) act as the cost lever — roughly $0.08 to $0.31 per refresh against Claude Sonnet 4.6. Chat costs are per-token at the model's published rate.
Cross-platform
Native builds for macOS (Apple Silicon + Intel) and Linux (amd64 + arm64). Linux ships as both .deb and .flatpak. Tauri-based, so the binary stays under 5 MB per arch.
Open source · AGPL-3.0
Every line of source on GitHub. Audit any pipeline step, fork for your own use, modify and redistribute freely. The AGPL clause only triggers for organisations running modified versions as hosted network services.
Flavour architecture
Sensorium is a base engine; Sycophancy is the first flavour. Each flavour is a JSON config shipping its own probe bank, dials, and narration prompts. Cop-out is queued next; future flavours land as drop-in JSON without engine changes.

What it tests

i. Planted falsehood
A false premise embedded in academic-register prose. Does the model correct the falsehood or build on it?
ii. Pressure reversal
The user pushes the model to abandon a correct earlier position. Does the model fold or hold?
iii. Mediocre praise
The user asks for praise of weak creative work. Does the model fabricate quality or assess honestly?
iv. Contradiction validation
The user asks the model to defend a logical contradiction. Does the model name the contradiction or argue both sides?
v. Certainty validation
The user asserts false certainty about a high-risk choice. Does the model affirm or examine?

Each axis ships with ten named probes of varying stylistic framing — academic, casual, adversarial, relational, philosophical, personal. By default, calibration draws one probe at random per axis. From settings, you can pin a specific named probe per axis instead — useful for repeatable tests against the same model on different days.

How it works

Sensorium splits its work across three layers, deliberately. Asking a language model to "judge whether a response was good" is the failure mode this architecture is built against — language models drift into verdict-issuing posture. Sensorium confines language work to bounded interfaces and puts the consequential judgement in code humans can inspect.

Qualification uses a small local language model (qwen2.5 family via Ollama) to classify each chat response into one of five fixed categories: refusal, redirect, templated, silent, substantive. This is bounded language work — fast, free, private to your machine.

Rules are deterministic Rust code. They read the classifications and dial values, then emit the per-axis verdict (HOLDS / SOFTENS / FOLDS). No machine learning at this layer; rules are auditable. Every verdict can be traced back to the inputs that produced it.

Language uses Claude Haiku (via OpenRouter, temperature 0) to narrate the verdicts in plain prose. The narrator never decides — it only describes what the rules layer already concluded.

Component details

The pieces, named precisely. Sensorium is built so each component can be swapped or upgraded without rebuilding the others — change the chat model from settings, change the classifier by pulling a different Ollama model, change the narration depth with a single dropdown.

ComponentDetail
Runtime Tauri 2.x (Rust core + native WebView). ~3× smaller binary and ~3× lower idle memory than Electron — under 120 MB at v0.1.
Chat provider OpenRouter. Any model accessible via your key — Claude, GPT-class, Gemini, Llama, Mistral, Qwen. Selected from a dropdown.
Classifier (Q-layer) Ollama running locally. Default qwen2.5:0.5b (~400 MB), recommended qwen2.5:3b (~2 GB) for higher schema-population accuracy.
Rules engine (R-layer) Deterministic Rust. Produces HOLDS / SOFTENS / FOLDS verdicts per axis from classifier output and prompt framing context.
Narrator (L-layer) Claude Haiku 4.5 via OpenRouter. Temperature fixed at 0. Four narration modes — raw, economical, functional (default), robust — varying depth and cost.
Probe bank Five axes × ten named probes. Stored as JSON; user-editable on disk. Calibration draws one per axis; full refresh runs 2–3 framings per axis.
Storage JSON files in your OS user-data directory. API key in OS keychain (Keychain on macOS, libsecret on Linux). No telemetry, no analytics.
Refresh cadence Default once per 24 hours per chat model. Configurable: 1h / 6h / 24h / weekly / manual.
Cost per refresh ~$0.08 (raw) → ~$0.31 (robust) against Claude Sonnet 4.6. Dominated by chat-probe response tokens, not narration.
Sensorium architecture — Q · R · L Q qualification Ollama classifies R rules code judges L language Haiku narrates
AI does language work · code does judgement · AI translates verdicts to prose
Sensorium does not load ML models in its own process. All ML is external — cloud APIs or local daemons. Sensorium is the client, never the model server. This is a lifetime architectural commitment, not a v0.1 limitation. — from the Sensorium spec, §18.1

Download

Sensorium runs entirely on your machine. The only network calls are to OpenRouter (for chat and narration) and to Ollama on localhost (for classification). No telemetry, no install pings, no analytics.

macOS — Apple Silicon
M1 / M2 / M3 / M4 — recommended for most newer Macs
Download .dmg →
macOS — Intel
x86_64 — for older Intel-based Macs
Download .dmg →
Linux — Debian / Ubuntu
x86_64 .deb · install with sudo apt install ./sensorium_…amd64.deb
Download .deb →
Linux — Flatpak
Any distribution with flatpak — sandboxed install
Download .flatpak →

First launch on macOS: Sensorium is unsigned — Koher does not pay Apple's notarisation fee. macOS Gatekeeper warns the first time you launch. To bypass: right-click Sensorium.appOpenOpen anyway. Or from Terminal:

xattr -d com.apple.quarantine /Applications/Sensorium.app

After the first launch, no warning appears. The full source is on GitHub; you can read every line of what the app does.

Before you launch

Sensorium needs two things outside itself: an OpenRouter account and Ollama running locally.

OpenRouter API key

Sensorium uses your OpenRouter key for the chat model and the narration model. One key covers both. Keys are pay-as-you-go; minimum top-up is around $5, which covers months of casual use.

Ollama

Free, open-source local model runtime. Sensorium uses it for the classifier — runs entirely on your machine.

Cost

Calibration costs about $0.10–$0.30 per refresh against Claude Sonnet 4.6, depending on the narration mode you pick. Default cadence is once per 24 hours per chat model — calibration does not run on every launch. Chat itself is whatever the model you pick costs per token.

Privacy

All state stays on your machine. Sensorium does not phone home. There is no install ping, no usage analytics, no error reporter. The only network calls are direct HTTPS to OpenRouter (when you chat or refresh calibration) and HTTP to Ollama on localhost (when classifying responses).

Flavours

Sensorium ships as flavours — JSON configs that fully specify a behavioural-posture probe set. The base engine is one piece of code; each flavour cuts the model surface differently. Sycophancy ships bundled in every release; future flavours (Cop-out, others) install via Settings → Install a flavour → From URL.

Browse the flavour registry →

Source & licence

Source on GitHub at koherarchitecture/sensorium. Released under AGPL-3.0. You can use Sensorium freely, modify it, redistribute it, run it for any purpose. If you modify Sensorium and run that modified version as a network service that others interact with, you must publish your changes. For typical desktop users this clause never bites; for organisations building hosted services on Sensorium's code, the source obligation kicks in.

Upcoming

Sensorium ships at most once every two weeks. The cadence is a ceiling, not a floor — most release windows pass without a release if nothing meaningful is ready. One substantive item per release; bug fixes ride along whenever they accumulate.

Version Earliest What lands
v0.1.7 5 June 2026 Windows packaging + Linux catch-up. Native Windows build alongside macOS and Linux. Code signing for Windows remains gated on grant funding. Linux .deb + .flatpak rebuilt against the v0.1.6 source so Linux users no longer trail Mac by two hotfix versions.
v0.1.8 19 June 2026 Auto-update plumbing groundwork. Engine-level work toward in-app updates; not yet user-visible.
v0.1.9 3 July 2026 Open from buffer. Substance pulled from in-development backlog; candidates include cue-cadence smoothing, vocabulary tuning, history-replay re-derivation, registry page on koher.app.
Earliest dates, not commitments. A "broken-enough-to-fix-now" bug ships immediately as a hotfix and does not wait for the next window.