Field note · Agent memory

What the agent remembers — and how.

"It remembered what I told it last week — but it forgot what I said three messages ago." Both can be true at the same time. Once you see why, the agent's behavior stops feeling random — and you see exactly what we mean by engineering context.

The reframe

Three things to know about agent memory.

The model itself remembers nothing. Everything it "knows" about you, the business, and the conversation is something we assemble and hand to it on every single turn.

01
The model

Memory is built, not had

Every message starts from a blank slate. Before the model answers, our orchestrator runs a "load context" step that pulls four things from the database and stitches them into the prompt — then throws that context away the moment the answer returns. "The agent remembered" really means "we looked it up and put it back in front of the model."

02
The layers

Memory is layered by scope and lifespan

There's no single "memory." There are four layers: project memory (stable business rules, true for everyone), user memory (your preferences, applied silently), the session summary (a rolling recap of this thread), and recent messages (the last ten turns, verbatim). A request that "should obviously be remembered" sometimes isn't — it landed in the wrong layer.

03
The guardrail

The orchestrator owns memory — not the agent

The agent never decides what to remember. Our workflow assembles context deterministically — same layers, same order, same rules, every run. That trades flexibility for predictability, and for anything touching financials, pipeline, and customer data, that's the right trade.

The mental shift: stop thinking of it as "the agent that remembers things." Think of it as a stateless model, plus a librarian that re-fetches the right context every turn. And forgetting is a feature — we deliberately compress and drop, because unbounded memory means ballooning cost, slower answers, and a model buried in irrelevant history.

The two shapes

Working memory vs. durable memory.

Almost every "why did it remember — or forget — that?" moment comes from confusing the two.

Shape 01

Working memory

Live, exact, disposable.

What's happening right now in the thread. The last ten messages are passed to the model word-for-word, so it can follow "what about that one?" Precise but short-lived — once the thread runs long, the oldest turns compress into the rolling summary and the literal text is gone. It's meant to fade.

Shape 02

Durable memory

Stable, scoped, reusable.

What stays true after you close the chat — facts true for the whole business ("we invoice net-30") and facts true for one person ("show currency without cents"). Loaded fresh into every new conversation, so the agent already knows them without you repeating yourself. We never hard-delete it; facts get flagged stale, so there's a lifecycle, not a shredder.

For the team

Where each fact belongs.

Most "the agent forgot" complaints are really "we stored that as working memory when it should have been durable" — or the reverse.

Working memory

In the moment

What's the balance on that invoice we were just looking at?
Only meaningful inside this thread. A new chat won't have it — and shouldn't.
Now do the same for the other client.
Pure context-carry — depends entirely on the messages right before it.
Durable · project

True for everyone

We bill net-30; reserve is two months of runway.
A rule every answer must respect, loaded for everyone.
'Closed Won' is the source of truth for revenue.
A shared definition that shouldn't drift between runs.
Durable · user

True for one person

I always want figures in a table, never in prose.
A quiet preference, saved to your account everywhere.
Skip the cents; round to whole dollars for me.
Small and personal — set once, honored everywhere.

A simple test: should this still be true next week, in a brand-new conversation? If yes, it's durable — store it once. If no, let the recent window and rolling summary handle it. Promoting a one-off to durable memory just clutters every future conversation with a fact that stopped being relevant an hour ago.

The decision tree

Two questions route any fact.

Walk anything that "should be remembered" through these in order — and answer honestly, not aspirationally.

Question 1

Does this fact outlive the conversation?

If it only matters inside the current chat, it lives in working memory — held for a while, then folded into a running summary as the thread grows. It's meant to fade. If it needs to persist beyond today, it moves to longer-term memory — which raises the next question.

Question 2

Is it true for everyone, or just this person?

If it's true for everyone — a rule, a definition, a threshold — it's project memory, loaded for every user as authoritative context. If it's specific to one person — a preference, a format, a default — it's user memory, applied silently and scoped to that account.

If a fact is hard to place, that's the signal — often one "thing to remember" is hiding several inside it. Split it, and each piece usually sorts itself cleanly. And remember the layer above all of them: the model holds none of this on its own. Memory isn't something the agent has — it's something we engineer, one turn at a time.

This is what we mean by "engineering context."

The difference between AI that feels reliable and AI that feels random is usually the plumbing underneath. That's the part we're good at — because we're data people first.