Give your agent a mental model

Agents rebuild their understanding of your codebase from scratch every session. Here is a small markdown file that fixes that — and turns your AI assistant into something that remembers, reasons, and pushes back.

The mental model you don't see

When you work on a codebase day after day, something quiet happens in the background. You build a map. You know that the auth middleware runs before the rate limiter, not after. You remember that the Order model can never have a null customerId because there's a database trigger, not just a TypeScript type, enforcing it. You recall that you tried switching the cache to Redis last summer and it actually made things slower because of the serialization overhead — so when someone proposes it again, you push back.

None of that map is in the code. It's in your head. It's why the code looks the way it does. Most senior engineers will tell you that the hardest part of joining a new project isn't reading the source — it's rebuilding that mental model from scratch.

Agents have the exact same problem, except they have it every single session.

The session-zero tax

Every time you open a new chat with Copilot, Claude Code, Cursor or whatever you're running this week, the agent starts from nothing. It has the file you've opened, the error message you pasted, and a vague idea of what your project is. That's it. To make any non-trivial change, it has to:

Search the codebase for relevant files.
Read those files (and their imports, and their imports' imports).
Form a hypothesis about how the pieces fit together.
Make the change — hopefully without breaking the parts it didn't read.

Every one of those steps costs tokens. Every one of those steps costs you wall-clock time while the agent thinks. And every one of those steps is a chance for the agent to get it almost right — to read the wrong three files, miss the one that contained the load-bearing invariant, and confidently produce a change that compiles, passes its narrow test, and silently breaks something downstream.

The more your codebase grows, the worse this gets. At 5k lines the agent can practically read everything. At 50k it has to choose. At 500k it's basically guessing which corner of the map to load. And the worst part is that you have no idea, in any given session, which parts of your code's reality the agent has actually internalised before it starts editing.

I've watched agents change code that had a comment three lines above explaining exactly why it was written that way. Not because the agent ignored the comment — because the agent never loaded the file in the first place.

Externalising the model

The fix is almost embarrassingly simple: write the mental model down, in a place the agent is contractually obligated to read before it touches anything.

That's the whole idea. A small set of structured markdown files that capture, in dense and scannable form:

What each subsystem owns — the state, the data, the responsibility.
What it reads from and writes to — the inputs and outputs.
The invariants — facts that must always hold true.
The flows — end-to-end traces of how an operation actually moves through the system.
The tensions — known trade-offs, awkward couplings, places where the design is fighting itself.
The decisions — past calls with their rationale, so they don't get silently reversed.

This isn't documentation in the traditional sense. It isn't for humans, primarily — although humans can read it just fine. It's a cache of understanding, written in a notation an agent can scan in a few hundred tokens and reconstruct the parts of the model it needs.

The key is the schema. Free-form prose is what makes traditional docs go stale: nobody can tell, six months later, whether a paragraph is still true. A line that starts with INVARIANT: is a claim. It's either true or it's a bug, and the next person — human or agent — knows exactly which one to check.

Two gates make it self-maintaining

The schema alone isn't enough. Without rules around it, the knowledge files drift the same way every other doc drifts: written once, never updated, eventually wrong, eventually ignored.

What closes the loop is two short rules added to the agent's top-level instructions:

Read gate. Before editing any file in the source tree, or making a non-trivial change anywhere, you MUST first read the knowledge files covering the affected domain. This rebuilds the mental model — state ownership, invariants, design rationale — so you can spot when a proposed change contradicts an earlier DECIDED entry and push back rather than silently regress it.

Write gate. During or after any task, if you discover a new invariant, state-ownership fact, data-flow edge, design decision, or tension that is not already in a knowledge file, you MUST add it to the correct file using the structured notation. These files are your externalised mental model — if you don't write it down, the next session will rediscover it from scratch.

The read gate solves the comprehension problem. Before the agent edits, it loads the relevant knowledge file and pays a small, fixed token cost to inherit everything previous sessions figured out. That is dramatically cheaper than re-reading thirty source files, and dramatically more accurate than guessing.

The write gate solves the staleness problem. The agent doesn't just consume the knowledge base — it maintains it. Every session that discovers something new is required to write that something back. Over a few weeks of normal use, the knowledge base converges on a fairly complete and accurate model, because it's being touched by every task.

This is the part that surprised me when I first set it up. I expected to spend hours seeding the knowledge base manually. I didn't. I wrote the structure, asked the agent to do an initial pass, and then just... worked normally. Within a week the files were richer than anything I would have written by hand, because they were being updated by the same process that was using them.

Pushing back

The most valuable thing this system does is the thing that's hardest to demonstrate in a single screenshot: it lets the agent disagree with you.

A DECIDED: entry isn't just a note. It's a contract. When you ask the agent to do something that contradicts a prior decision, the read gate forces it to encounter that decision before it starts implementing. The instructions say, in so many words: cite the entry, ask the human to justify overriding it, don't silently reverse it.

In practice this means you stop accidentally undoing your own past work. You decided six months ago, after a long debugging session, that the cache layer wouldn't be made async because of a specific ordering problem with the render pipeline. You wrote it down as a DECIDED: entry. Today, with that context completely gone from your head, you ask the agent to "make the cache async, it'll be faster." Instead of cheerfully complying, the agent comes back with: "There is a DECIDED entry in knowledge/cache.md saying this was tried and reverted because of an ordering issue with the render pipeline. Do you want to override it, and if so, how should we handle that ordering?"

That is an enormous shift in the relationship. The agent stops being a fast-but-forgetful junior who does whatever you say, and starts being something closer to a peer who has read the project history and is prepared to argue.

The token economics

There's a practical angle to this too, and it's worth being honest about: knowledge files save real money.

A dense knowledge file for a subsystem is typically 200–800 tokens. Reading the actual source code for the same subsystem — even with smart search and selective reads — is routinely 5,000 to 20,000 tokens, sometimes much more. Multiply that by every task you run in a week, across a team, and the difference is not subtle.

More importantly, the knowledge file gives the agent the right tokens. It's a curated summary of what matters, written specifically to support the kind of reasoning the agent needs to do. Reading raw source gives the agent a lot of tokens, most of which are noise for the question it's actually trying to answer. Better signal, fewer tokens, faster responses, fewer mistakes. There's no trade-off here — it's just better on every axis.

The bootstrap file

I've been describing the shape of the system. The actual instructions for setting it up — the folder structure, the schema, the verbatim text for the read/write gates, the retrospective skill that closes the loop — all live in a single markdown file that you hand to your agent.

Download bootstrap-knowledge-base.md

Drop it at the root of any repository and tell your agent:

"Read bootstrap-knowledge-base.md and execute it."

That single sentence is the whole onboarding. The file walks the agent through:

Creating the folder structure (instructions/, knowledge/, skills/) in whatever location your tool expects — .github/ for Copilot, .cursor/rules/ for Cursor, CLAUDE.md and siblings for Claude Code. The structure is the same; only the filenames change.
Writing out the four-tier model (always-loaded hub → auto-applied conventions → on-demand mental model → on-demand workflows) so the agent knows where each kind of information belongs.
Adopting the knowledge-file schema with the seven tags (OWNS, READS FROM, WRITES TO, INVARIANT, FLOW, TENSION, DECIDED).
Installing the read/write gates verbatim in the top-level instructions, so future sessions are bound by them.
Setting up a retrospective skill that runs after non-trivial tasks and routes lessons learned to the right file — a knowledge entry, an instruction rule, a code comment, or a new skill.
Doing an initial population pass over the codebase, then handing back a list of files created and any domains it couldn't confidently document (so you can fill those in).

It's a single file. It runs once. After that, the system maintains itself.

You can delete the bootstrap file once setup is done, or keep it around as a reference. I keep mine.

What it doesn't fix

This isn't magic. A few honest caveats.

It's not a substitute for tests. Knowledge files describe what should be true. Tests verify what is true. You need both. If anything, the knowledge base makes test failures more useful, because the INVARIANT entry tells you what was supposed to hold and the failing test tells you that it doesn't.

It depends on the agent following its own instructions. The read/write gates are rules in a markdown file, not enforcement at the protocol level. A sufficiently rushed prompt can still get the agent to skip them. In my experience, modern agents are pretty good at honouring this kind of instruction when it's at the top of their context — but it's not bulletproof. If you catch it skipping, call it out; the next session will be more careful.

The initial seeding is the hardest part. Your first pass over an existing codebase will produce a knowledge base that's incomplete and slightly wrong in places. That's fine. The write gate fixes it as a side effect of normal work. Don't try to make the seed perfect.

It doesn't scale infinitely. A knowledge file with 200 entries is no longer scannable. When a file gets too dense, split it by sub-domain. The four-tier model has room for as many tier-3 files as you need.

Worth the half hour

Setting this up takes maybe thirty minutes the first time. The bootstrap file does most of the work; you mostly answer questions and confirm the file list before the agent populates it.

What you get back is an agent that, within a week or two, knows your codebase about as well as you do — and in some places better, because it's reading every change while you're only reading the ones you happen to be involved in. It costs less to run, makes fewer mistakes, and starts doing the most underrated thing a collaborator can do: telling you when you're about to repeat a mistake you already learned from.

That last part is the whole reason I built this. Not the token savings, not the speed. The push-back. Having something in the loop that has read the project's history and is willing to remind me of it when I forget — which, increasingly, is most of the time.

Give it a try. The download link is up there.