Comparison

Tenure vs. Agentmemory

Name: precisionMemBench
Creator: Tenure
License: https://opensource.org/licenses/MIT

Agentmemory is a zero-database memory runtime for coding agents. Triple-stream retrieval (BM25 + vector + knowledge graph). Zero active retrieval passes. Drift score: 0.81.

TL;DR

Triple-stream retrieval sounds impressive. It achieves 0.17 precision and zero active passes.
Recall of 0.97 confirms the pattern: everything comes back. Retrieval is not selectivity.
Drift score 0.81. Off-topic turns contaminate subsequent retrieval heavily.
MCP-only interface: the model decides when to fetch memory. Tenure injects automatically on every request.
No scope isolation. No supersession handling. No per-turn audit trail.

Tenure precision

1.00

43/43 active passes

Agentmemory precision

0.17

0/43 active passes

Tenure latency

9.77ms

p50 retrieval

Agentmemory latency

82.28ms

p50 retrieval

More retrieval paths does not mean better precision

Agentmemory combines BM25, vector search, and a knowledge graph into a triple-stream retrieval pipeline with on-device reranking. The architecture adds complexity at every stage. The result: 0.17 mean precision with zero active passes on 43 precision-assertion cases.

The three streams all share the same fundamental limitation: they cannot structurally exclude beliefs that are semantically proximate but irrelevant. Adding more retrieval paths with the same limitation compounds noise rather than improving precision.

Tenure uses a single retrieval path (alias-weighted BM25 with hard scope isolation) and achieves 1.00 precision. The architectural difference is not "more paths" vs. "fewer paths." It is that Tenure's primary signal is orthogonal to semantic similarity: term matching over indexed aliases where the user coined the vocabulary. That signal is precise by construction in bounded vocabulary contexts.

MCP vs. proxy: when memory fires

Agentmemory exposes memory through MCP tools (memory_recall, memory_smart_search). The model must decide to call them. In practice, models frequently do not recognize when they need prior context, especially on questions where the user assumes shared history.

Tenure operates as a proxy between your client and your provider. Memory is injected into every request automatically, before the model sees the prompt. No tool call to trigger. No prompt engineering. No silent failures where memory was available but never fetched.

Full comparison

Property	Tenure	Agentmemory
Mean retrieval precision	1.00	0.17
Active retrieval passes	43/43	0/43
Total passes (77 non-session)	77/77	7/77
Retrieval latency (p50)	9.77ms	82.28ms
Session latency (p50)	47.79ms	98.49ms
Drift score	0.00	0.81
Memory delivery	Automatic (proxy)	MCP tool call (model decides)
Scope isolation	Hard filter	Project tagging (soft)
Supersession	Chain with audit trail	Decay scoring
Per-turn injection audit	Yes	No
Works across every client	Any OpenAI-compatible	MCP-capable agents only
Runs locally	Always	Always
External databases	None (single container)	None (single process)
License	MIT	Apache-2.0

Full methodology: arXiv:2605.11325. Dataset: HuggingFace. Live leaderboard: HuggingFace Spaces.

Tenure vs. Agentmemory

TL;DR

More retrieval paths does not mean better precision

MCP vs. proxy: when memory fires

Full comparison

Precise by construction. Not by luck.