MCP treats memory as a tool call. That's the wrong model. Memory isn't something you invoke. It's something that should already be there.
The Model Context Protocol is well-designed for what it was built to do: give a model structured, on-demand access to external systems. Fetch a file. Query a database. Create a calendar event. These are discrete actions that should happen when explicitly requested, and MCP handles them well.
Memory is not that. Memory is not a resource you fetch on request. It is context that should be present before the conversation starts, without the model or the user having to think about it. When a developer tells their AI assistant that the auth service uses Redis for sessions, they are not creating a tool result. They are establishing a belief that should shape every future response about that system, across every future session, without any explicit retrieval step.
The distinction matters more than it appears. A tool call is optional input. Memory is structural context. Treating one as the other produces a system that technically works but behaviorally fails in ways that are hard to articulate and harder to fix.
A tool call is optional input. Memory is structural context. The difference is not cosmetic. It determines when the information arrives, how it is weighted, and whether the model knows to use it.
When memory is implemented as an MCP tool, retrieval happens inside the conversation turn. The model receives a user message, decides (or is instructed) to call the memory tool, gets results back, and then formulates a response incorporating those results.
This creates three concrete failure modes that do not exist when memory is injected at the proxy layer.
The model has already begun processing the user's message before memory is consulted. In practice, the system prompt and user message establish a reasoning trajectory. Memory retrieved mid-turn is appended context, not foundational context. It does not shape the initial interpretation of the query.
The model must construct a query to retrieve relevant memory. That query is derived from the current message, which means memory retrieval is only as good as the model's ability to predict what context it needs before it has that context. This is circular in ways that matter at the edges.
Tool calls are optional. A model that has been given a memory tool will call it inconsistently. Faster responses, shorter system prompts, ambiguous instructions, and temperature variation all affect whether the tool is invoked. Memory that is sometimes consulted is not memory the system can be said to have.
An AI client sends a request to a provider. That request contains a system prompt and a conversation history. Before any of that reaches the model, it passes through the network.
A proxy that sits in that path can intercept the outbound request, extract beliefs from the conversation history, retrieve relevant context from a local store, and inject that context into the system prompt before the request continues. From the model's perspective, the context was always there. From the user's perspective, nothing happened. There was no tool call. No retrieval step. No prompt modification visible to either party.
This is not a minor architectural variation on MCP-based memory. It is a different model of what memory is. The proxy treats memory as ambient context. MCP treats memory as a resource. These produce different systems with different failure modes, different precision properties, and different relationships to the conversation.
Most MCP memory implementations use vector search. The belief store is an embedding index. Retrieval is semantic similarity. This is the standard approach for RAG pipelines, and it is wrong for memory for the same reasons it is right for document retrieval.
Document retrieval is a recall problem. You have many documents, you want the ones most likely to contain relevant information, and approximate similarity is an appropriate proxy for relevance. Some noise is acceptable because the model can filter.
Memory retrieval is a precision problem. You have a set of structured beliefs about a specific user, project, and domain. You want exactly the beliefs that apply to this query and none that do not. Approximate similarity produces noise, and that noise is injected into the model's context where it compounds. On PrecisionMemBench, vector-based memory systems return 8 to 18 irrelevant beliefs per query at a mean precision of 0.05 to 0.09. The model compensates, until the context window fills or the contradictions accumulate.
8–18 irrelevant beliefs returned per query. Model compensates via in-context reasoning. Fails as context window fills.
Exact beliefs returned. No noise. No irrelevant context injected. Precision holds as the belief store grows.
BM25 over typed, structured beliefs is not a novel idea. It is the correct application of a well-understood technique to a problem where precision is the primary requirement. The reason the field defaulted to vector search is that vector search is the standard tool for semantic retrieval, and memory was framed as a semantic retrieval problem. It is not. It is a structured lookup problem with strong typing requirements and hard scope constraints.
In a multi-project or multi-user environment, belief isolation is a correctness requirement.
A session in project:client-a must not surface beliefs from project:client-b,
regardless of semantic proximity. An engineer working on a billing system must not see
injected context from a different team's codebase, even if the embedding distances are small.
Vector search cannot provide this guarantee. Semantic proximity is a continuous measure. You can down-rank out-of-scope results, but you cannot structurally exclude them. The boundary is probabilistic.
A typed belief store with hard scope filters applied before retrieval scoring can provide this guarantee. Out-of-scope beliefs are not down-ranked. They are absent. This is not a tuning parameter. It is a consequence of the data model.
The practical implication is that MCP-based memory systems with vector search cannot make a verifiable isolation claim, only a probabilistic one. For regulated environments, audit requirements, or any use case where cross-project contamination is a real risk, a probabilistic claim is not a claim at all.
Vector search cannot provide scope isolation as a guarantee. It can provide it as a tendency. For most engineering contexts, the difference matters.
A developer uses VS Code with Copilot in the morning and Claude.ai in the browser in the afternoon. Both sessions concern the same codebase, the same constraints, the same beliefs. An MCP memory tool connected to VS Code accumulates context there. Claude.ai knows none of it.
This is not a limitation of MCP specifically. It is a consequence of placing memory at the application layer rather than the network layer. Any memory system that lives inside a client, or is accessible only via a protocol that clients must explicitly support, fragments context across the tools a developer actually uses.
A proxy operates below the client. It intercepts requests from any OpenAI-compatible client, regardless of which application generated the request. VS Code, Claude.ai, Open WebUI, LibreChat, a custom script — they all pass through the same proxy, which means they all share the same belief store. Memory accumulates regardless of which tool the developer is using, and is injected regardless of which tool they use next.
This is cross-client persistence without any client cooperation. It requires no plugin, no integration, no explicit MCP support. The protocol is HTTP. Every AI client already speaks it.
None of this is an argument against MCP. MCP is the right abstraction for tools. The argument is narrower: memory is not a tool, and treating it as one produces a system with the wrong properties in the places that matter most.
Memory should arrive before the first token, not after the model decides to fetch it. It should be precise, not approximate. It should enforce scope structurally, not probabilistically. It should accumulate across every client a developer uses, not just the one that has the plugin installed.
These are not nice-to-have properties. They are the properties that determine whether a memory system actually changes how a developer works, or whether it is infrastructure that technically runs and practically disappears.
The proxy layer provides them. The tool layer does not. That is the argument.