Visible to Whom?

The Builder's View

You built the system. You understand how attention heads distribute weight across token sequences. You know why your multi-agent pipeline routes a data analysis task to one model for extraction and another for verification. You can trace a decision through your architecture — from input to intermediate representation to output — and explain, at each step, what the system did and approximately why.

This is genuine knowledge. It took years to acquire. It shapes how you evaluate claims about AI, and it should.

Now consider your user. A data analyst receives a flag from your system: "This pattern is anomalous." She asks: "Why?"

What does she see?

The User's Question

In most AI systems — whether a single model or a multi-agent pipeline — the answer to "why?" is generated by a language model. The system produces an explanation: "The values in column F deviate from the historical trend by more than two standard deviations, suggesting..."

This explanation is plausible. It may even be correct. But now the analyst pushes: "By what criteria did you determine this was anomalous rather than simply noisy? What threshold separated 'worth flagging' from 'normal variation'?"

The system generates another explanation. The analyst pushes again. Another explanation. Each answer is itself a probabilistic output — a sequence of tokens selected for plausibility, not a report from a deterministic process. The chain of justification never reaches ground. It produces explanations about explanations, indefinitely.

You, the builder, know this. You understand that the model is generating text that resembles reasoning rather than reporting reasoning. You can look at the model's internals and form your own assessment of what happened. But your user cannot. And your system does not offer her what you have: a view of the actual decision process.

Understanding the machine is not the same as making the machine understandable.

Two Kinds of Visibility

There is a distinction that the AI industry has not yet articulated clearly, because the people building the systems and the people using the systems rarely sit in the same room.

Builder Visibility	User Visibility
Lives in the engineer's head	Lives in the architecture
Requires training to interpret	Requires no training to read
Traces model internals	Inspects decision criteria
Answers: "How does it work?"	Answers: "Why this verdict?"
Available to the person who built it	Available to the person affected by it

Builder visibility is real and valuable. It is how you debug, how you improve, how you build trust in your own system. But it does not transfer. Your user does not have it. Your user has the output and, if your system offers one, an AI-generated explanation of the output. These are not the same thing.

The confusion is natural. When you can see inside the system, the system feels transparent. The opacity is invisible to the person who has access. It is only visible to the person who does not.

Where This Becomes a Problem

In many domains, it is not a problem at all. A user who asks a search engine for the best restaurants nearby does not need to inspect the ranking criteria. A reader who asks for a research summary wants the best available answer. Output quality is the right metric. Builder visibility is sufficient.

But there are domains where the user needs more than the output. They need to see the criteria that produced it — not as an AI-generated explanation, but as something they can inspect, challenge, and learn from.

Domain	Why the User Needs to See the Criteria
Data analysis	An analyst told "this is anomalous" must verify the threshold against domain knowledge. If the threshold is invisible, the flag is an opinion — and the analyst cannot distinguish a genuine finding from a model artefact.
Education	A student told "your concept is unclear" learns nothing. A student shown which dimension scored below which threshold — and who set that threshold — begins a conversation with the criteria. That is where learning happens.
Clinical assessment	A clinician receiving a risk flag must trace the judgement to a specific criterion and verify it against clinical guidelines. An opaque flag is not actionable — it is a suggestion from an unquestionable source.
Legal review	A clause flagged as problematic must be traceable to a named criterion. The criterion can be argued. The AI explanation cannot — it has no author, no standard, no accountability.

In each of these cases, the question is not "Is the output good?" It is "Can the person affected by the output see the judgement that produced it?" And crucially: can they disagree with a specific criterion rather than accepting or rejecting the output wholesale?

The Engineering Response

This is not a philosophical problem. It is an architectural one.

If you want the user to see the criteria, the criteria must exist as something other than weights in a neural network. They must be encoded in a form a human can read: thresholds, conditional logic, relationship rules. Code, not parameters.

The Separation

Stage 1: Pattern Recognition → AI extracts structured signals from unstructured input

Stage 2: Deterministic Judgement → Code applies thresholds, rules, relationships — readable, auditable, adjustable

Stage 3: Narration → AI translates the verdict into plain language

In this architecture, when the user asks "Why?", the chain terminates. Not in another AI explanation, but in a specific rule: line 47, threshold 0.40, set by a domain expert, adjustable via configuration. The user can read it. The user can disagree with it. The user can ask for the threshold to be changed. The judgement is not generated — it is there.

AI handles what AI does well: recognising patterns in language, narrating decisions in natural language. Code handles what code does well: applying consistent, auditable, reproducible logic. Neither replaces the other. They handle different types of cognition.

Not a Critique

Multi-agent pipelines, model orchestration, ensemble systems — these produce better output. The benchmarks confirm it. This is real and valuable engineering work.

The question this piece raises is not "Is that approach wrong?" It is "Is that approach sufficient for domains where the user needs to see the judgement?"

For research synthesis, recommendation, content generation — output quality is the right optimisation target. Builder visibility is enough. The user wants the best answer, not a tour of the decision process.

For data analysis, education, clinical assessment, legal review — the user is not a passive consumer of output. The user is a practitioner who must verify, challenge, learn from, or act on the judgement. In these domains, user visibility is not a feature. It is the point.

I use Claude constantly. I could not build Koher without it. The question was never about whether AI is capable. It was about who the capability is visible to — the person who built the system, or the person whose work the system judges.

Summary

The builder of an AI system understands how it works. This understanding is genuine and hard-earned. But it does not automatically transfer to the user. The user sees the output and, at best, an AI-generated explanation of the output. The decision criteria — the thresholds, the rules, the logic that separated "flag this" from "ignore this" — remain inside the model's parameters, accessible to the engineer but invisible to the person affected by the judgement.

In domains where users need to verify, challenge, or learn from the judgement, this gap is architectural. It is not solved by better models, better orchestration, or better explanations. It is solved by placing the judgement in a layer the user can read: deterministic code, set by domain experts, adjustable when standards change.

Understanding the machine is engineering. Making the machine understandable is architecture. They are different problems, and they require different solutions.