Writing MCP and memory
Architecture

Why MCP is the wrong abstraction for memory

MCP treats memory as a tool call. That's the wrong model. Memory isn't something you invoke. It's something that should already be there.

Tenure research · ~8 min read

TL;DR

  • MCP is a tool protocol. Memory is not a tool.
  • Tool calls are explicit, discrete, and optional. Memory is implicit, continuous, and always relevant.
  • When memory requires an invocation, it becomes a second-class input; loaded too late, shaped by the wrong query, and absent when the model doesn't know to ask.
  • The correct layer for memory is the proxy: present before the first token, invisible to both the model and the user, and structurally separate from the AI client.
The premise

MCP is a good protocol for the wrong job

The Model Context Protocol is well-designed for what it was built to do: give a model structured, on-demand access to external systems. Fetch a file. Query a database. Create a calendar event. These are discrete actions that should happen when explicitly requested, and MCP handles them well.

Memory is not that. Memory is not a resource you fetch on request. It is context that should be present before the conversation starts, without the model or the user having to think about it. When a developer tells their AI assistant that the auth service uses Redis for sessions, they are not creating a tool result. They are establishing a belief that should shape every future response about that system, across every future session, without any explicit retrieval step.

The distinction matters more than it appears. A tool call is optional input. Memory is structural context. Treating one as the other produces a system that technically works but behaviorally fails in ways that are hard to articulate and harder to fix.

A tool call is optional input. Memory is structural context. The difference is not cosmetic. It determines when the information arrives, how it is weighted, and whether the model knows to use it.

The retrieval problem

Tool-based memory retrieves at the wrong time

When memory is implemented as an MCP tool, retrieval happens inside the conversation turn. The model receives a user message, decides (or is instructed) to call the memory tool, gets results back, and then formulates a response incorporating those results.

This creates three concrete failure modes that do not exist when memory is injected at the proxy layer.

01

Late arrival

The model has already begun processing the user's message before memory is consulted. In practice, the system prompt and user message establish a reasoning trajectory. Memory retrieved mid-turn is appended context, not foundational context. It does not shape the initial interpretation of the query.

02

Query dependency

The model must construct a query to retrieve relevant memory. That query is derived from the current message, which means memory retrieval is only as good as the model's ability to predict what context it needs before it has that context. This is circular in ways that matter at the edges.

03

Discretionary skipping

Tool calls are optional. A model that has been given a memory tool will call it inconsistently. Faster responses, shorter system prompts, ambiguous instructions, and temperature variation all affect whether the tool is invoked. Memory that is sometimes consulted is not memory the system can be said to have.

The architecture argument

Memory belongs at the proxy layer, not the tool layer

An AI client sends a request to a provider. That request contains a system prompt and a conversation history. Before any of that reaches the model, it passes through the network.

A proxy that sits in that path can intercept the outbound request, extract beliefs from the conversation history, retrieve relevant context from a local store, and inject that context into the system prompt before the request continues. From the model's perspective, the context was always there. From the user's perspective, nothing happened. There was no tool call. No retrieval step. No prompt modification visible to either party.

This is not a minor architectural variation on MCP-based memory. It is a different model of what memory is. The proxy treats memory as ambient context. MCP treats memory as a resource. These produce different systems with different failure modes, different precision properties, and different relationships to the conversation.

User message
Model
MCP memory tool call
Model resumes
Memory arrives after initial processing. Query is model-generated. Invocation is discretionary.
User message
Proxy injects context
Model sees enriched prompt
Memory is structural. Arrives before first token. No tool call. No discretion.
The precision argument

Retrieval precision collapses when you use the wrong index

Most MCP memory implementations use vector search. The belief store is an embedding index. Retrieval is semantic similarity. This is the standard approach for RAG pipelines, and it is wrong for memory for the same reasons it is right for document retrieval.

Document retrieval is a recall problem. You have many documents, you want the ones most likely to contain relevant information, and approximate similarity is an appropriate proxy for relevance. Some noise is acceptable because the model can filter.

Memory retrieval is a precision problem. You have a set of structured beliefs about a specific user, project, and domain. You want exactly the beliefs that apply to this query and none that do not. Approximate similarity produces noise, and that noise is injected into the model's context where it compounds. On PrecisionMemBench, vector-based memory systems return 8 to 18 irrelevant beliefs per query at a mean precision of 0.05 to 0.09. The model compensates, until the context window fills or the contradictions accumulate.

Vector search
0.05–0.09 Retrieval precision (PrecisionMemBench)

8–18 irrelevant beliefs returned per query. Model compensates via in-context reasoning. Fails as context window fills.

BM25 + typed beliefs
1.0 Retrieval precision (PrecisionMemBench)

Exact beliefs returned. No noise. No irrelevant context injected. Precision holds as the belief store grows.

BM25 over typed, structured beliefs is not a novel idea. It is the correct application of a well-understood technique to a problem where precision is the primary requirement. The reason the field defaulted to vector search is that vector search is the standard tool for semantic retrieval, and memory was framed as a semantic retrieval problem. It is not. It is a structured lookup problem with strong typing requirements and hard scope constraints.

The scope argument

Scope isolation requires a structural guarantee, not a semantic one

In a multi-project or multi-user environment, belief isolation is a correctness requirement. A session in project:client-a must not surface beliefs from project:client-b, regardless of semantic proximity. An engineer working on a billing system must not see injected context from a different team's codebase, even if the embedding distances are small.

Vector search cannot provide this guarantee. Semantic proximity is a continuous measure. You can down-rank out-of-scope results, but you cannot structurally exclude them. The boundary is probabilistic.

A typed belief store with hard scope filters applied before retrieval scoring can provide this guarantee. Out-of-scope beliefs are not down-ranked. They are absent. This is not a tuning parameter. It is a consequence of the data model.

The practical implication is that MCP-based memory systems with vector search cannot make a verifiable isolation claim, only a probabilistic one. For regulated environments, audit requirements, or any use case where cross-project contamination is a real risk, a probabilistic claim is not a claim at all.

Vector search cannot provide scope isolation as a guarantee. It can provide it as a tendency. For most engineering contexts, the difference matters.

The client coupling argument

MCP memory couples persistence to the AI client

A developer uses VS Code with Copilot in the morning and Claude.ai in the browser in the afternoon. Both sessions concern the same codebase, the same constraints, the same beliefs. An MCP memory tool connected to VS Code accumulates context there. Claude.ai knows none of it.

This is not a limitation of MCP specifically. It is a consequence of placing memory at the application layer rather than the network layer. Any memory system that lives inside a client, or is accessible only via a protocol that clients must explicitly support, fragments context across the tools a developer actually uses.

A proxy operates below the client. It intercepts requests from any OpenAI-compatible client, regardless of which application generated the request. VS Code, Claude.ai, Open WebUI, LibreChat, a custom script — they all pass through the same proxy, which means they all share the same belief store. Memory accumulates regardless of which tool the developer is using, and is injected regardless of which tool they use next.

This is cross-client persistence without any client cooperation. It requires no plugin, no integration, no explicit MCP support. The protocol is HTTP. Every AI client already speaks it.

Conclusion

The abstraction you choose determines what you can build

None of this is an argument against MCP. MCP is the right abstraction for tools. The argument is narrower: memory is not a tool, and treating it as one produces a system with the wrong properties in the places that matter most.

Memory should arrive before the first token, not after the model decides to fetch it. It should be precise, not approximate. It should enforce scope structurally, not probabilistically. It should accumulate across every client a developer uses, not just the one that has the plugin installed.

These are not nice-to-have properties. They are the properties that determine whether a memory system actually changes how a developer works, or whether it is infrastructure that technically runs and practically disappears.

The proxy layer provides them. The tool layer does not. That is the argument.

Related

More from Tenure