Koher Architecture Specification

The Broken Relationship

What does it mean to understand something in a domain?

A senior physician recognises a constellation of symptoms and knows — before running tests, before consulting references — that something warrants attention. An experienced contract lawyer reads a clause and feels discomfort, a sense that something is missing or asymmetric, even before they can articulate what. A design educator sees a student's concept statement and immediately grasps whether the student has clarity or fog.

This recognition is the culmination of years of deliberate practice. It operates below the level of conscious articulation. The expert notices patterns that novices cannot see, weighs factors that novices do not know exist, and reaches judgements that feel like intuition but are actually the product of encoded expertise.

The problem is not that this expertise is rare. The problem is that it is unreproducible. It cannot be written down as a checklist. It cannot be taught through documentation. It cannot be transferred through mentoring except slowly and incompletely, because the expert's own awareness of their judgement process is partial and often inarticulate.

And so organisations face a persistent tension: the expertise exists, but it does not scale.

The Common Error

When organisations attempt to scale expert judgement using AI, they typically make the same architectural error: they ask a language model to do everything.

"Is this contract complete?"
"Is this research proposal coherent?"
"Is this student's concept well-articulated?"

The language model generates a plausible answer. It may even be correct. But the answer is unauditable. No one can inspect how the model reached its conclusion. No one can adjust the thresholds by which it judged. No one can verify that the same input will produce the same judgement tomorrow, or after the next model update, or across different instances of the same query.

This is not a defect of particular models. It is an architectural consequence of asking a probabilistic text generator to perform deterministic judgement.

Language models are trained to predict probable next tokens. They have no internal model of truth — only a model of what text tends to follow what other text. When asked to judge whether something is "good" or "complete" or "coherent," they generate text that resembles expert assessment without encoding the structure of expert reasoning.

The result is a tool that works intermittently, fails unpredictably, and degrades trust over time.

The Separation

The Koher architecture addresses this by separating three distinct concerns:

The Three Layers

Qualification

Transforms unstructured input into structured signals

Humans or AI extract what is present

Rules

Applies deterministic logic to structured signals

Code handles judgement

Language

Translates rule outputs into readable explanation

AI narrates decisions already made

This separation is not merely a technical architecture. It is an ontological claim about what AI does well and what it does not.

AI is good at pattern recognition across language. Given enough examples, AI systems can learn to extract signals from unstructured text with high reliability. They can recognise the presence or absence of certain patterns, estimate confidence levels, categorise text into types.

AI is not good at judgement. Judgement requires consistency (the same input produces the same output), auditability (the reasoning can be inspected and verified), and stability (the behaviour does not drift with model updates). These are properties that probabilistic text generation cannot guarantee.

AI is good at language generation. Once a judgement has been made through deterministic means, AI can translate that judgement into natural language explanation. It can narrate, contextualise, and phrase — tasks where its probabilistic nature is an asset rather than a liability.

The architecture separates these concerns so that each layer does what it does best.

Stage 1: Qualification

The purpose of the first stage is to transform unstructured input into structured signals that deterministic code can operate on.

Qualification can be performed by humans, by AI, or by both. The choice depends on the domain:

Human qualification works when users have domain knowledge and can identify signals directly (e.g., selecting qualities that describe their design intent)
AI qualification works when patterns must be extracted from text that users cannot easily decompose themselves (e.g., detecting coherence gaps in a concept statement)
Hybrid qualification combines both — users provide some signals, AI extracts others

This is not merely classification. Depending on the domain and the input type, qualification might involve:

Approach	When to Use
Human Selection	When users have domain knowledge to identify signals directly
Pattern Matching	When explicit markers exist in domain conventions
Structured Extraction	When input follows consistent formatting
Named Entity Recognition	When extracting typed entities
Semantic Classification	When semantic judgement is needed across many examples
Zero-shot or Few-shot	When no training data exists yet
Hybrid Pipelines	When combining approaches catches more signal

The unifying principle is not "use a classifier." It is: transform unstructured input into typed, qualified chunks that rules can evaluate.

What emerges from Stage 1

Typed signals (entities, presence/absence markers, category labels)
Confidence scores where semantic judgement was required (0.0 to 1.0)
Structured data that can be passed to deterministic rules

What does not happen in Stage 1

No judgement about whether the signals indicate "good" or "bad"
No threshold application
No generation of natural language explanation

Stage 1 extracts what is there. Stage 2 decides what it means.

Stage 2: Deterministic Rules

The second stage is pure code. No AI. No learned parameters. No probabilistic generation.

The rules layer receives structured signals from Stage 1 and applies explicit logic to produce judgements.

What the rules layer does

Applies threshold logic: "If confidence score < 0.6 on this dimension, flag as requiring attention"
Evaluates relationships between signals: "If claim is present but evidence is absent, surface the gap"
Handles edge cases through explicit conditional branches
Produces severity levels or priority categories

Properties of the rules layer

Deterministic: The same input always produces the same output
Auditable: The logic is explicit and inspectable
Adjustable: When domain standards change, thresholds can be updated in code
Versioned: Rule changes are tracked in version control

When the rules layer produces a judgement, that judgement is reproducible, explainable, adjustable, and immune to model updates.

The rules layer is where expert judgement is encoded, not learned. The thresholds and relationships represent domain knowledge made explicit.

Stage 3: Language Interface

The third stage translates the rules layer's output into readable explanation.

A language model is invoked — but only after judgement has been made. The model receives structured data describing what was found (dimensional signals) and how it was evaluated (severity levels). It generates natural language that explains these determinations in terms a domain practitioner would recognise.

What the language model sees

The original input text (so it can reference specific phrases)
The signals extracted by Stage 1
The severity levels assigned by Stage 2

What the language model does NOT see

Raw confidence scores (it sees severity levels only)
The rules themselves (it sees their output, not their logic)

If the language model hallucinates — claims something is problematic when the rules did not flag it, or misrepresents a severity level — the structured output from Stages 1 and 2 provides an immediate check. The user can compare the narrative against the dimensional assessment and catch any discrepancy.

The language layer's variability is architecturally harmless. Even if the model phrases feedback differently on different runs, the underlying judgement remains identical. The narrative varies; the verdict does not.

The constraint: Stage 3 explains what Stages 1 and 2 determined. It does not introduce new judgements. It does not override severity levels. The language model is a narrator, not a judge.

Domain-Meaningful Categories

Consider how most AI tools present their findings.

Binary systems say: "This is present" or "This is absent." The user has no visibility into nuance. A finding that barely crossed the threshold looks identical to one that exceeded it overwhelmingly.

Raw-score systems say: "This scores 0.72." But what does 0.72 mean? Without domain-specific thresholds, the number is noise.

The Koher architecture translates quantitative signals into domain-meaningful categories. The specific categories depend on what the tool measures and what users need to understand.

Example: Confidence-based categories

For tools measuring signal presence

Confidence	State	What It Communicates
> 0.8	Solid	Clearly present. No action needed.
0.5 – 0.8	Worth Examining	Something is there, but too vague to confirm. Consider sharpening.
< 0.5	Attention Needed	Absent. This warrants attention.

Example: Relationship-based categories

For tools measuring how signals interact

Similarity	State	What It Communicates
≥ 0.42	Synergy	These signals reinforce each other.
0.15 – 0.42	Neutral	These signals coexist independently.
< 0.15	Tension	These signals pull in opposite directions.

The principle is consistent: translate quantitative precision into qualitative meaning that enables action. The thresholds and category names adapt to each domain.

Where useful, users can see both: the categorical state (for quick interpretation) and the underlying score (for those who want precision). Neither is hidden. Neither is privileged.

What the Architecture Does Not Claim

The Koher architecture is not a replacement for domain expertise. It is an encoding of domain expertise — a way to make expert judgement reproducible at scale while keeping the expert in the loop.

What the architecture does not do

Replace human review. The output is recommendation, not decision.
Guarantee perfect accuracy. The qualification layer learns from examples and has measurable error rates.
Work without domain knowledge. The rules layer must be designed by someone who understands the domain's judgement criteria.
Eliminate the need for training data. Semantic qualification approaches require labelled examples.

What the architecture does do

Separate what AI does well from what requires deterministic guarantees
Make uncertainty visible rather than collapsing it into false confidence
Produce auditable, reproducible, adjustable judgements
Enable experts to focus their attention where expert judgement matters most

Where the Architecture Applies

The three-layer pattern applies wherever:

Experts can recognise quality but struggle to articulate why
Training data is expensive — you cannot simply scrape the internet
Wrong AI output is costly — hallucinations cause real harm
Human judgement must remain central — the goal is assistance, not automation
Consistency matters — the same input should produce the same evaluation

Domain	What Gets Qualified	What Rules Encode
Design Education	Concept clarity, evidence, scope, assumptions, gaps	Coherence thresholds, dimension relationships
Legal	Clause presence, obligation symmetry, liability exposure	Risk thresholds, completeness standards
Academic	Research question clarity, methodology coherence	Institutional standards, rubric criteria
Medical	Symptom coverage, differential completeness	Clinical guidelines, diagnostic protocols
Financial	Thesis coherence, risk acknowledgment	Regulatory requirements, disclosure standards

The specific qualification approach and rule logic vary by domain. The architectural pattern remains constant.

The Commitment

Koher is a ten-year practice.

The architecture is demonstrated through tools that ship open source. The first three releases — Coherence Diagnostic, Play Shape Diagnostic, and Fragment Mapper — are under the MIT licence. Every tool released after 5 April 2026 is under AGPL-3.0, which protects the commons by requiring anyone who hosts the tools as a network service to share their modifications. The split exists because the three MIT releases predate the AGPL decision, and changing the licence of code that has already been visited, read, and downloaded would create confusion for early users who took the original terms at face value. The three MIT tools therefore stay MIT. Each tool solves one narrow problem using the three-layer pattern. Some will be useful. Some will not. All add to the body of work.

The tools are free and always will be. The consultancy exists for those who want help applying the architecture in their own domain. But the practice continues whether or not anyone pays. [Correction, April 2026: there is no consultancy. Koher pivoted out of consultancy in February 2026 to become a teacher-making-tools practice. Funding now comes through grants, patrons, and personal commitment — never through paid services.]

This is not a startup seeking product-market fit. It is a body of thought that compounds over time.

Summary

The Koher Architecture

Qualification

Transform unstructured input into structured signals

Humans or AI extract what is present

Rules

Apply deterministic logic to produce judgements

Code handles judgement

Language

Translate judgements into readable explanation

AI narrates decisions already made

The separation is the architecture.

AI handles language. Code handles judgement. Humans make decisions.