Koher Architecture Specification

A philosophy for building AI tools that separate language from judgment.

The Broken Relationship

What does it mean to understand something in a domain?

A senior physician recognises a constellation of symptoms and knows — before running tests, before consulting references — that something warrants attention. An experienced contract lawyer reads a clause and feels discomfort, a sense that something is missing or asymmetric, even before they can articulate what. A design educator sees a student's concept statement and immediately grasps whether the student has clarity or fog.

This recognition is the culmination of years of deliberate practice. It operates below the level of conscious articulation. The expert notices patterns that novices cannot see, weighs factors that novices do not know exist, and reaches judgments that feel like intuition but are actually the product of encoded expertise.

The problem is not that this expertise is rare. The problem is that it is unreproducible. It cannot be written down as a checklist. It cannot be taught through documentation. It cannot be transferred through mentoring except slowly and incompletely, because the expert's own awareness of their judgment process is partial and often inarticulate.

And so organisations face a persistent tension: the expertise exists, but it does not scale.

The Common Error

When organisations attempt to scale expert judgment using AI, they typically make the same architectural error: they ask a language model to do everything.

"Is this contract complete?"
"Is this research proposal coherent?"
"Is this student's concept well-articulated?"

The language model generates a plausible answer. It may even be correct. But the answer is unauditable. No one can inspect how the model reached its conclusion. No one can adjust the thresholds by which it judged. No one can verify that the same input will produce the same judgment tomorrow, or after the next model update, or across different instances of the same query.

This is not a defect of particular models. It is an architectural consequence of asking a probabilistic text generator to perform deterministic judgment.

Language models are trained to predict probable next tokens. They have no internal model of truth — only a model of what text tends to follow what other text. When asked to judge whether something is "good" or "complete" or "coherent," they generate text that resembles expert assessment without encoding the structure of expert reasoning.

The result is a tool that works intermittently, fails unpredictably, and degrades trust over time.

The Separation

The Koher architecture addresses this by separating three distinct concerns:

The Three Layers

Qualification
Transforms unstructured input into structured signals
Humans or AI extract what is present
Rules
Applies deterministic logic to structured signals
Code handles judgment
Language
Translates rule outputs into readable explanation
AI narrates decisions already made

This separation is not merely a technical architecture. It is an ontological claim about what AI does well and what it does not.

AI is good at pattern recognition across language. Given enough examples, AI systems can learn to extract signals from unstructured text with high reliability. They can recognise the presence or absence of certain patterns, estimate confidence levels, categorise text into types.

AI is not good at judgment. Judgment requires consistency (the same input produces the same output), auditability (the reasoning can be inspected and verified), and stability (the behaviour does not drift with model updates). These are properties that probabilistic text generation cannot guarantee.

AI is good at language generation. Once a judgment has been made through deterministic means, AI can translate that judgment into natural language explanation. It can narrate, contextualise, and phrase — tasks where its probabilistic nature is an asset rather than a liability.

The architecture separates these concerns so that each layer does what it does best.

Stage 1: Qualification

The purpose of the first stage is to transform unstructured input into structured signals that deterministic code can operate on.

Qualification can be performed by humans, by AI, or by both. The choice depends on the domain:

  • Human qualification works when users have domain knowledge and can identify signals directly (e.g., selecting qualities that describe their design intent)
  • AI qualification works when patterns must be extracted from text that users cannot easily decompose themselves (e.g., detecting coherence gaps in a concept statement)
  • Hybrid qualification combines both — users provide some signals, AI extracts others

This is not merely classification. Depending on the domain and the input type, qualification might involve:

Approach When to Use
Human Selection When users have domain knowledge to identify signals directly
Pattern Matching When explicit markers exist in domain conventions
Structured Extraction When input follows consistent formatting
Named Entity Recognition When extracting typed entities
Semantic Classification When semantic judgment is needed across many examples
Zero-shot or Few-shot When no training data exists yet
Hybrid Pipelines When combining approaches catches more signal

The unifying principle is not "use a classifier." It is: transform unstructured input into typed, qualified chunks that rules can evaluate.

What emerges from Stage 1

  • Typed signals (entities, presence/absence markers, category labels)
  • Confidence scores where semantic judgment was required (0.0 to 1.0)
  • Structured data that can be passed to deterministic rules

What does not happen in Stage 1

  • No judgment about whether the signals indicate "good" or "bad"
  • No threshold application
  • No generation of natural language explanation

Stage 1 extracts what is there. Stage 2 decides what it means.

Stage 2: Deterministic Rules

The second stage is pure code. No AI. No learned parameters. No probabilistic generation.

The rules layer receives structured signals from Stage 1 and applies explicit logic to produce judgments.

What the rules layer does

  • Applies threshold logic: "If confidence score < 0.6 on this dimension, flag as requiring attention"
  • Evaluates relationships between signals: "If claim is present but evidence is absent, surface the gap"
  • Handles edge cases through explicit conditional branches
  • Produces severity levels or priority categories

Properties of the rules layer

  • Deterministic: The same input always produces the same output
  • Auditable: The logic is explicit and inspectable
  • Adjustable: When domain standards change, thresholds can be updated in code
  • Versioned: Rule changes are tracked in version control

When the rules layer produces a judgment, that judgment is reproducible, explainable, adjustable, and immune to model updates.

The rules layer is where expert judgment is encoded, not learned. The thresholds and relationships represent domain knowledge made explicit.

Stage 3: Language Interface

The third stage translates the rules layer's output into readable explanation.

A language model is invoked — but only after judgment has been made. The model receives structured data describing what was found (dimensional signals) and how it was evaluated (severity levels). It generates natural language that explains these determinations in terms a domain practitioner would recognise.

What the language model sees

  • The original input text (so it can reference specific phrases)
  • The signals extracted by Stage 1
  • The severity levels assigned by Stage 2

What the language model does NOT see

  • Raw confidence scores (it sees severity levels only)
  • The rules themselves (it sees their output, not their logic)

If the language model hallucinates — claims something is problematic when the rules did not flag it, or misrepresents a severity level — the structured output from Stages 1 and 2 provides an immediate check. The user can compare the narrative against the dimensional assessment and catch any discrepancy.

The language layer's variability is architecturally harmless. Even if the model phrases feedback differently on different runs, the underlying judgment remains identical. The narrative varies; the verdict does not.

The constraint: Stage 3 explains what Stages 1 and 2 determined. It does not introduce new judgments. It does not override severity levels. The language model is a narrator, not a judge.

Domain-Meaningful Categories

Consider how most AI tools present their findings.

Binary systems say: "This is present" or "This is absent." The user has no visibility into nuance. A finding that barely crossed the threshold looks identical to one that exceeded it overwhelmingly.

Raw-score systems say: "This scores 0.72." But what does 0.72 mean? Without domain-specific thresholds, the number is noise.

The Koher architecture translates quantitative signals into domain-meaningful categories. The specific categories depend on what the tool measures and what users need to understand.

Example: Confidence-based categories

For tools measuring signal presence

Confidence State What It Communicates
> 0.8 Solid Clearly present. No action needed.
0.5 – 0.8 Worth Examining Something is there, but too vague to confirm. Consider sharpening.
< 0.5 Attention Needed Absent. This warrants attention.

Example: Relationship-based categories

For tools measuring how signals interact

Similarity State What It Communicates
≥ 0.42 Synergy These signals reinforce each other.
0.15 – 0.42 Neutral These signals coexist independently.
< 0.15 Tension These signals pull in opposite directions.

The principle is consistent: translate quantitative precision into qualitative meaning that enables action. The thresholds and category names adapt to each domain.

Where useful, users can see both: the categorical state (for quick interpretation) and the underlying score (for those who want precision). Neither is hidden. Neither is privileged.

What the Architecture Does Not Claim

The Koher architecture is not a replacement for domain expertise. It is an encoding of domain expertise — a way to make expert judgment reproducible at scale while keeping the expert in the loop.

What the architecture does not do

  • Replace human review. The output is recommendation, not decision.
  • Guarantee perfect accuracy. The qualification layer learns from examples and has measurable error rates.
  • Work without domain knowledge. The rules layer must be designed by someone who understands the domain's judgment criteria.
  • Eliminate the need for training data. Semantic qualification approaches require labelled examples.

What the architecture does do

  • Separate what AI does well from what requires deterministic guarantees
  • Make uncertainty visible rather than collapsing it into false confidence
  • Produce auditable, reproducible, adjustable judgments
  • Enable experts to focus their attention where expert judgment matters most

Where the Architecture Applies

The three-layer pattern applies wherever:

  • Experts can recognise quality but struggle to articulate why
  • Training data is expensive — you cannot simply scrape the internet
  • Wrong AI output is costly — hallucinations cause real harm
  • Human judgment must remain central — the goal is assistance, not automation
  • Consistency matters — the same input should produce the same evaluation
Domain What Gets Qualified What Rules Encode
Design Education Concept clarity, evidence, scope, assumptions, gaps Coherence thresholds, dimension relationships
Legal Clause presence, obligation symmetry, liability exposure Risk thresholds, completeness standards
Academic Research question clarity, methodology coherence Institutional standards, rubric criteria
Medical Symptom coverage, differential completeness Clinical guidelines, diagnostic protocols
Financial Thesis coherence, risk acknowledgment Regulatory requirements, disclosure standards

The specific qualification approach and rule logic vary by domain. The architectural pattern remains constant.

The Commitment

Koher is a ten-year practice.

The architecture is demonstrated through tools that ship open source under MIT licence. Each tool solves one narrow problem using the three-layer pattern. Some will be useful. Some will not. All add to the body of work.

The tools are free and always will be. The consultancy exists for those who want help applying the architecture in their own domain. But the practice continues whether or not anyone pays.

This is not a startup seeking product-market fit. It is a body of thought that compounds over time.

Summary

The Koher Architecture

Qualification
Transform unstructured input into structured signals
Humans or AI extract what is present
Rules
Apply deterministic logic to produce judgments
Code handles judgment
Language
Translate judgments into readable explanation
AI narrates decisions already made

The separation is the architecture.

AI handles language. Code handles judgment. Humans make decisions.